Managing OEM Repository Growth

Leave a comment

In my shop, we completed installing OEM agents on all the nodes to be monitored a few months back. However, the MGMT_TABLESPACE continues to grow. I figured that could be reasonable since I didn’t know the details of data retention parameters that govern when data gets purged. If a purge parameter was set to 6 months, but we had only reached our high number of targets 3 months ago, then naturally the database would keep growing.

I found these two links that explain purge parameters nicely.
How to change retention parameters for metrics history
https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=430830.1
How to change retention parameters for job history
https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=604978.1

There are three tables that contain the bulk of the data in the OEM tablespace MGMT_TABLESPACE. They are MGMT_METRICS_RAW, MGMT_METRICS_1HOUR and MGMT_METRICS_1DAY. They contain partitions that are named after time intervals, and OEM takes care of adding and dropping them. The default retention period for the raw data is 7 days, for the 1hour aggregate is 31 days and for the 1day aggregate is 365 days.

If you want to change the 7 day default period for MGMT_METRICS_RAW to 3 days, you must insert a record into the MGMT_PARAMETERS table to add the parameter mgmt_raw_keep_window.
INSERT INTO MGMT_PARAMETERS (PARAMETER_NAME, PARAMETER_VALUE) VALUES (‘mgmt_raw_keep_window’,’3′);
The companion parameters for the other two tables are mgmt_hour_keep_window and mgmt_day_keep_window.

OEM: Obscure Tools for Tough Problems

2 Comments

Two aspects of OEM (Oracle Enterprise Manager) that I’m getting more acquainted with…

Scripting
Even though OEM automates tasks and provides a GUI interface to make tasks easier, you still need scripting. You can use the OEM gui to fire off a SQL statement against 50 different databases. But what if you need to change the value OEM stores for the password for the database user with which OEM monitors all those instances? What if you want to change the monitoring threshold on a tablespace in 20 databases?

You need OEM’s command line scripting interface to pull off tasks like those. It’s called, gasp, emcli. You can get it configured quite quickly on your OEM management server. It can also be run from remote laptops, but that seems like a security risk to me. Interestingly, the emcli in OEM version 10.2.0.5.0 seems to have a wider variety of features than does emcli in 10.2.0.4.0 OEM. An example is a bunch of switches in the change password command that make it more secure.

Enterprise Manager Command Line Interface manual provides all the detailed documentation needed.

Diagnosing Problems: EM Diag Kit
Keeping OEM happy involves endless trips to Metalink to identify issues with various Oracle versions and configurations. Lately I’ve been trying to solve why the OEM agent stops and restarts itself on some hosts. My efforts with the folks at Metalink led today to a utility called the EMDiag Kit. This little dynamo scans the OEM repository to help the support folks figure out what’s going on with various OEM components. If you think a little EM diagnosis is needed on your OEM system, hop on to Metalink and check out article EMDiagkit Download and Master Index, Doc ID 421053.1.

Backup and Recovery with 11g Database Control

Leave a comment

11G Oracle Enterprise Manager Database Control now has an extensive GUI based interface to RMAN. You can view RMAN backups, run RMAN backups, check on status of jobs, schedule RMAN jobs, all within the OEM db control GUI.

But wait, there’s more!

There is also something called Data Recovery Advisor (DRA), which will AUTOMATICALLY detect database failures that require recovery, and will suggest multiple recovery approaches. It can then automatically create the needed RMAN script and then run it for you if you wish! Of course, you need to be in archive log mode and you need to have a viable RMAN backup complete.

This tool (DRA) can also be accessed from the RMAN prompt with commands like LIST FAILURE; ADVISE FAILURE, REPAIR DATABASE;

PLEASE NOTE: OEM 11g Database Control is different than OEM grid control. Highest version OEM grid control is 10.2.0.5. Database control is the standalone version of OEM grid control. The features discussed here are only available in the 11G standalone database control, not the 10g grid control. Fortunately, you can have both OEM 10g grid control monitoring your 11g database PLUS run the 11g database control. At least it worked for me. I’d like to double-check this with Oracle.

How do you get OEM 11g database control working? Use the emca tool.
If you don’t have it working already, these 10g instructions work for 11g:
Recreate Database Control Repository, by Jeff Hunter (thanks Jeff!)
You might hit this problem, which I did in my test db:
Meta link doc id (733710.1) EMCA on 11g Db Fails With: Listener is Not Up

How do you stop and start database control?
emctl start dbconsole
emctl stop dbconsole
emctl status dbconsole

How do you access database control once running?
https://host:port/em

How do you see DRA within OEM 11g Database Control?
Once you are in 11g Database Control, go to the “Availability” link to see all the backup and recovery options.
DRA can be accessed via the Availability : Perform Recovery : Oracle Advised Recovery section
It can also be launched from “Advisor Central”, a link found in the “Related Links” section on the bottom of many pages.

More links on DRA
Tim Hall’s great DRA review
Using DRA from 2 Day DBA guide
Repairing Data Corruptions from DBA guide

Organizing Oracle Enterprise Manager for Monitoring

Leave a comment

Oracle Enterprise Manager, aka OEM, aka Grid Control, has lots of parts and lots of screens and lots of log files and lots of features and lots of components and lots of meta-link entries and documentation. Planning, deployment and use of OEM made my head spin for months. I’ve only just now got an almost firm grip on how to organize our OEM deployment to meet the monitoring needs for our DBA team. I’m using OEM 10.2.0.4.0, and have some plans to upgrade to 10.2.0.5.0 over the next few months.

Requirements in my shop were:
1. Allow all DBAs to administer all databases, but be able in the future to limit specific DBAs to specific systems.
2. Allow for two separate on call rotations, and set things up so that On-Call Team A does not get pages at night for databases looked after by On-Call Team B.
3. Distinguish between production and non-production databases so that we can report on them separately and so that an individual DBA can get paged and emailed for a production database at 1AM, but will not get any pages at 1AM for a non-production database.
4. Be able to customize the monitoring threshold settings for individual databases.

I worked with OEM rules, administrators, roles, groups, schedules and metrics and policy settings to meet these requirements.

I created groups classified by business unit, environment and target type. For example: SALES_PROD_LISTENERS, SALES_NONPROD_LISTENERS, HR_PROD_DB, HR_NONPROD_DB, etc. Do this in the targets section.

I created rules also classified by business unit, environment and target type. SALES_PROD_LISTENER_AVAILABILITY, for example. I associated this rule with the matching group. This means that this rule will fire only against specific databases belonging to that group. Go to Preferences – Rules for this.

I created Roles for each target type and assigned all targets of that type to that role. ALL_LISTENERS, for example. If needed in the future I can change this to SALES_LISTENERS and MARKETING_LISTENERS if I need to. Go to Setup – Roles for this.

Each real life person has two accounts, one for production systems and one for non-production systems. These administrator accounts subscribe to the appropriate rules for the systems that belong to their on-call rotation. For example, the production account subscribes to the production rules, e.g. SALES_PROD_LISTENER_AVAILABILITY, but not SALES_NONPROD_LISTENER_AVAILABILITY. They need two accounts because schedules are linked to accounts. In that way, the production account gets pages and emails 24/7 for production systems, and the non-production account only gets notifications during the day for non-production systems, provided the schedules for each account are set up correctly and the accounts are subscribed to the correct rules. Manage Administrators on the Setup page. Each administrator must log in as themselves and then manage Schedules and Rule subscriptions using the links on the Preferences page. Lastly, individual DBAs can customize metric and policy settings for individual systems so that threshold settings are in line for that system. Change metric and policy settings on the target home page for the specific database. Link is at the bottom of the page.

Getting in Deep with OEM

Leave a comment

I’ve been working more with Oracle Enterprise Manager Grid Control at work. I’m currently working on a non-production deployment that we will keep around for purposes of testing patches, etc. Hope to get the production deployment done in a a month or so…

This product is massive! Getting familiar with all the components is similar to learning Oracle DBA for the first time. Every time you think you have reached a plateau of understanding, you realize there’s much more to grok to keep the environment working and collecting data happily.

Today I’ve been digging into collection issues. Some Oracle Homes did not show up, even though the agent was running OK on that node. Turns out the problem was “Daily Host / Inventory Configuration Collection Fails with ORA-1 [Unique Key Violated] errors”, metalink note NOTE:579735.1. Then I started working on “Warning During Collection Of Oracle Software Unknown External Name for the Following Patchset Version”. That one was solved by metalink NOTE:434167.1.