Rebuilding Yum Database

Leave a comment

When starting Hadoop components on the Linux virtual machines on my laptop, I periodically get failures at startup. The culprit? A corrupt yum database. I’ve been running Hortonworks 2.0 recently and have used the Ambari GUI to start services. The failure I encounter looks like this:

notice: Finished catalog run in 0.65 seconds
err: /Stage[1]/Hdp/Hdp::Lzo::Package[64]/Hdp::Package[lzo 64]/Hdp::Package::Process_pkg[lzo 64]/Package[lzo]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install lzo' returned 1: rpmdb: Thread/process 3570/140380093466368 failed: Thread died in Berkeley DB library
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 -  (-30974)
error: cannot open Packages database in /var/lib/rpm

Error: rpmdb open failed

The following series of commands straighten out the situation I typically encounter.

# rm -f /var/lib/rpm/__db*
# db_verify /var/lib/rpm/Packages
# rpm --rebuilddb
# yum clean all

The command yum clean all will not work until the first 3 commands are run, should the case be the DB_RUNRECOVERY error show above.

vmware fusion fixed ip stuff

Leave a comment

As a presales engineer at a small big data company with our own database that runs on many platforms, I now use VMWare Fusion all the time.

Here’s some of the grunt work I do when creating new VMs for which I want a fixed IP address and a changed hostname.

If I make the new VM and it’s linux, I may use the VMWare rapid installation feature. Then I want to change the root password. That just requires a restart. During restart, hit any key on the keyboard to bring up the Linux GRUB screen. Then hit “e” on the keyboard to edit the start parameters. Use the down arrow to get to the line starting with “kernel”. Then hit e again to edit that line. Delete the last two words, “rhgb quiet” and replace them with “1”. Then hit enter to preserver your change and then b to boot. You will start in single user mode as root. From there, you can use the command “passed” to provide a new password for user root.

Then to change the hostname, I use these three steps on Centos Linux, after becoming root of course.
1. Use the hostname command to change the host name.
[root@localhost ~]# hostname yournewhostname.local

2. Edit /etc/sysconfig/network to include the new hostname.

3. Back on my Mac, I figure out what the static IP address will be. With VMWare Fusion 5 (and apparently 4 or higher) this involves editing the following file:
/Library/Preferences/VMware Fusion/vmnet8/dhcpd.conf

Here’s a great write up on what to do.

4. Edit /etc/hosts to include the new hostname. Also include the static ip you chose in step 3.

5. I’ve been using Centos for my VMs. I’ve been using the GUI “Network Connections” tool under the System – Preferences menu to create the static ip address. On the first screen, select your ethernet card. Mine shows up as eth0 on the Centos Linux VMs I’ve created. Then click the Edit button.

On the resulting screen, I click on the IPv4 tab.

Then I switch the method to Manual. I click Add and make the address my new static IP that I chose in step 3 above. Netmask is The gateway is the IP address of your Mac. I also make the DNS server the IP address for my Mac.

Privileges Needed to Backup Databases

Leave a comment

Creating backups are one of the most vital tasks in maintaining computer infrastructure. System administrators and Database Administrators in particular get saddled with this unglamorous task. Backing up an operating system (OS) generally does not require any special manipulation of the OS prior to sending OS files to a backup target such as tape, disk or deduplicated disk targets like Data Domain. A system administrator can do things the old fashioned way such as generating a tar file and sending it to the backup target. Or he or she can deploy sophisticated backup software such as EMC’s Avamar to administrate and automate the backup process across hundreds or thousands of computers.

In any event, backing up the OS generally requires the root or Administrator level privilege. And this is rarely a problem because the System Administration teams runs the backups, and those users typically have that privilege anyway.

Backing up databases presents a variety of challenges. One is that for many database systems the Database Administrator (DBA) must place the database in a special mode prior to creating the backup. Another is that on many popular database systems, the DBA must backup files that store the record of transactions since these are used to permit recovery to any point in time. Such files are known as transaction logs on SQL Server, Sybase, Postgres and DB2. They are called archive logs in Oracle, although older Oracle documents called them offline redo logs.

And another challenge is that the privilege needed to backup a database varies among different database vendors. If the DBA team is running the backups, the level of privilege needed within the database is not a problem. But if the database backups are getting run by another team, then the DBA team may be reluctant to give out a high level of privilege in order to backups. What to do? To date, there is not much flexibility in the privilege needed to do the database backup.

Here’s a table spelling out backup privileges needed by an internal database user running a backup.

ORACLE The database user executing the RMAN backup needs the SYSDBA role. The users SYS and SYSTEM have this role. Or a new user can be created, but it must have SYSDBA.
If using an RMAN catalog, the database user that connects to the RMAN catalog does NOT have to have SYSDBA. It does need the RECOVERY_CATALOG_OWNER role. In other words, one use can run the backup and another user can record the activity in the RMAN catalog.

And what about Oracle 12C? It’s possible that the privilege needed to backup the new pluggable database model will no longer need SYSDBA. I’ll update this post when 12C gets released.

SQL SERVER For an individual database, a SQL Server database user with the database level role db_backupoperator can perform a backup upon that database. NOTE that this privilege must be granted within each new database that gets added to the instance. The server level role sysadmin will allow a user to backup any database, even new ones. While db_backupoperator is much lower privileged than sysadmin, only sysadmin insures that new databases will get backed up. Using db_backupoperator requires explicitly granting the privilege on each new database.
SYBASE ASE The lowest level of privilege needed is the Operator role. Operator will permit backup and restore on any database in the instance.

The System Administrator (SA) privilege also provides this, but this is the highest level role.

Finally, database owner can do backup and restore, but this privilege must be granted individually for each database.

The Operator role will work with new databases, so it is a good choice.
DB2 You must have SYSADM, SYSCTRL, or SYSMAINT authority to use the backup and recover utilities in DB2. SYSADM is the most powerful role of the 3. SYSCTRL controls operations that affect system resources, but cannot update table data. SYSMAINT can perform maintenance operations on all individual databases within an instance, but like SYSCTRL, cannot update table data. SYSCTLR and SYSMAINT both offer a lower privileged account than SYSADM.

NOTE: I plan to add to the above table as I get more information on databases.

Creating an RMAN Recovery Catalog

Leave a comment

Need to create and RMAN Recovery catalog? Check the Creating a Recovery Catalog section of the Backup and Recovery User Guide document.

Steps listed there are:
1. Create a tablespace for the catalog
2. Create a user that will own the schema and assign to it the newly created tablespace.
3. Grant the RECOVERY_CATALOG_OWNER privilege to the user, and if 11g or higher, grant CONNECT as well.

SQL> create tablespace TOOLS datafile 'C:\ORADATA\RCATDB\TOOLS_01.DBF' size 100M;

Tablespace created.

SQL> create user RCAT identified by RCAT default tablespace TOOLS temporary tablespace TEMP quota unlimited on TOOLS;

User created.

SQL> grant recovery_catalog_owner to RCAT;

Grant succeeded.

SQL> grant connect to RCAT;

Grant succeeded.

4. Using the RMAN utility, login with the CONNECT CATALOG syntax as the recovery catalog schema owner and run CREATE CATALOG. This will build the schema. You can specify a tablespace if desired.

C:\Users\emcadmin> rman

Recovery Manager: Release - Production on Wed Jan 2 18:30:33 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect catalog RCAT/RCAT

connected to recovery catalog database

RMAN> create catalog tablespace tools;

recovery catalog created

At this point one can begin registering within the catalog databases to be backed up.

RMAN> connect target;

RMAN> register database; 

SQL for checking RMAN catalog or control file for backup information

Leave a comment

Do you need a quick query to check the local control file to see if backups succeeded? Use the below if there is no RMAN catalog involved.

Query to use when storing meta data in control file.
SID column will be 0 unless working with Oracle RAC.
select sid, object_type, status, 
round((end_time - start_time) * 24 * 60, 2) duration_minutes,  
to_char(start_time, 'mm/dd/yyyy hh:mi:ss') start_time, 
to_char(end_time, 'mm/dd/yyyy hh:mi:ss') end_time,
round((input_bytes/(1024*1024*1024)),2) input_gb, 
round((output_bytes/(1024*1024*1024)),2) output_gb
from v$rman_status
where operation = 'BACKUP';

If you are using the RMAN catalog, then you can run this query instead.

Query to use when storing meta data RMAN catalog.
select db_name, object_type, status, 
round((end_time - start_time) * 24 * 60, 2) duration_minutes,  
to_char(start_time, 'mm/dd/yyyy hh:mi:ss') start_time, 
to_char(end_time, 'mm/dd/yyyy hh:mi:ss') end_time,
round((input_bytes/(1024*1024*1024)),2) input_gb, 
round((output_bytes/(1024*1024*1024)),2) output_gb
from rc_rman_status
where operation = 'BACKUP';
where end_time  > (SYSDATE - 31)

quick peak at postgres post install on ubuntu

Leave a comment

I’m checking out how postgres works.

I installed it to 32-bit ubuntu version 12.04. Googling lead to multiple places for the appropriate apt-get commands.

The install created a user, postgres, that runs the postgres binaries. However, root owns the binary files.

A default instance got created. I haven’t learned yet if people often run multiple postgres instances per host.

The following processes run with the default installation:
postgres: writer process
postgres: WAL process
postgres: autovacuum launcher
postgres: stats collector process

The first process is the main postgres process, and it was launched with the -D parameter pointing to a specific directory and the -c parameter pointing to the full path of the postgres.conf file.

The writer process I’m surmising must write to data files, and the WAL process I’ve read elsewhere is the Write Ahead Log, similar to redo log writer in Oracle. Autovacuum launcher governs the ability to automatically run the VACUUM command, which is needed in Postgres periodically. And I’m sure the stats collector updates query optimization stats, but I’ll have to check.

There’s a command psql that is the equal of sqlplus. I’ll explore psql in a follow up post.

Documentation for postgres can be found at

Having worked as a SQL Server and Oracle DBA, keeping track of database storage is important. Documentation for those two products describes early on how each system places all objects into datafiles. A datafile can contain tables, indexes, stored procedures, views and everything else.

Postgres on the other hand relegates discussion of physical storage to a location fairly deep in the documentation. Each table gets its own datafile. A master directory tree contains all the object in the postgres database, with most objects getting their own separate file. And postgres dictates the directory structure, although perhaps in more advanced deployments users can control some aspects. The filenames have a number which is automatically generated by postgres. My instance installed to /var/lib/postgresql/9.1/main. There are multiple sub-directories below that.

Done writing for now, but I’m going to create some tables, bang around with psql and try out the gui admin tool pgadmin III.

learning postgres

Leave a comment

I work at EMC in the Backup and Recovery Services (BRS) division, and we use postgres. It powers our backup software catalog for Avamar. We use it as a database repository for Data Protection Advisor (DPA). And it was the first database to be virtualized automatically by VMware’s vFabric Data Director.

In the Big Data landscape, postgres pops up all the time. Greenplum uses it. Netezza uses it. Hadapt, a newcomer to the big data space uses it. I think maybe Platfora uses it but by this point my head is spinning and I can’t even remember where I read that. And Cloudera uses it to store management data.

And probably a bizillion other peices of software use it.

I’m interested in EMC and in big data so I’m going to start learning postgres.

I’ll finish up with the “What about mysql?” question. In general, I’d always read that mysql is easy to learn, fast by default and deployed widely for small web apps. And that postgres is slower, but more reliable and feature rich. Some recent browsing reminded me that MySQL has corporate backing, first from its original corporate owners, then from Sun and now from Oracle, which currently owns it. And Postgres remains 100% open source. Finally, mysql allows users to pick their storage engine while Postgres provides for just one.

mysql vs postgres links for reference:
One on
A stackoverflow question with responses
An ancient article still getting traffic
A blog posting by Chris Travers

Older Entries