Mike Ault's thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the BLOG or are insulting or demeaning to anyone. The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer.

Monday, October 09, 2006

That's a Secret, Agent Man!

Mike Ault, Copyright 2006

And if I told you I would have to kill you…well…probably not so I will any way. After my fun with getting Grid Control for Oracle10gR2 re-installed and working I needed to get the new and improved more stable more reliable, able to fix your coffee and press your suit intelligent agents running on my RAC nodes so I could monitor them. With their usual aplomb Oracle has provided almost enough documentation to achieve this, almost.

I downloaded the agent install files for Linux and then unzipped them on both of my Linux nodes. I also downloaded the proper install files for my XP server where the Grid Control was installed. First, I attempted the “silent” install which involves using the:
“agentDownload.linux –b /home/oracle/linux –c “aultlinux3,aultlinux4” –n crs –m rem208742 –r 4889” command (-b is for the home for the agent, -c is the nodelist for the cluster, -n is cluster name, -m is the master node for the OMS and –r is the port to attach to on the OMS node). It started like gangbusters then stalled complaining that the “linux\oui\oui_linux.jar” file was missing. I checked on the XP box and sure enough it was missing, however, guess what, it was on the Linux platform where I started the command from! So, I copied the file and directory over to the XP box and reran the command, then it complained about the agent directory and its contents being missing, notice a pattern here? Yep, it was on the Linux platform. So, being of almost sound mind I decided that I needed to copy the entire directory structure that the instructions seemed to indicate belonged on the Linux side over to the XP side.

Once the directories where copied the command was quite happy to recopy and uncompress all of the files back to the Linux side and then attempt to start the universal installer in silent mode, which it did, however…it insisted it needed a new inventory location (why it couldn’t just add its entries to the existing inventory is beyond my ken. Anyway, no matter what I fed it, for example it has a –i option to provide a file containing a pointer to an alternate inventory, however, as usual, the instructions say “file containing pointer to inventory location” but no directions on formatting the entry. After several unsuccessful attempts I decided I had beat the dead horse enough and fired off the “runInstaller” command to install in real time. I am pleased to say other than requiring a different inventory location it worked fine and soon my “intelligent” agent was running happily along.

However, a word about the emctl program is in order, there will be at least two versions of the program on a machine running the agent and Oracle database software; one will be for the database control and will reside in the ORACLE_HOME/bin and the other will be for the Grid Control and will reside in the ORACLE_OMSHOME/bin location, they are not interchangeable. You must use the ORACLE_OMSHOME/bin version to control the agent for the Grid Control. Ok, back to the “intelligent agent”, a quick check of its status using the “emctl status agent” command on each node showed that while the agent was indeed up and running, it had no clue how to talk to the OMS host, couldn’t exchange heartbeats and couldn’t upload its various XML delicacies to the host.

It seems the OMS host (the XP box) had a full domain specification (it is a work machine and came configured that way) so when the agent installation asked for the host, it merrily supplied the host and full domain. Since my home office, while it is my domain, doesn’t have a domain of its own, the agent got a bogus address. In addition, while it started the agent, it didn’t start it as a secure agent. Another problem was that the XP host starts up the network connections with a firewall enabled that blocks all attempt so cuddle up to the machine from all comers.

So, first I turned off the firewalls on the network connections for the XP host, that at least let me communicate with the box (I could now ping it from both Linux boxes and get a reply). Next, I looked in the ORACLE_OMSHOME/sysman/config/emd.properties file and adjusted the various URLs there to eliminate the bogus domain specifier and finally I made sure all of the time zone specifications where set according to the specifications in the ORACLE_OMSHOME/sysman/admin/nsuppertedtzs.lst file. So now I had the proper host address in the REPOSITORY_URL and the proper timezone in the agentTZRegion entry, it was time to secure start the agent.

I issued the following commands:

emctl stop agent
Then you cleanout all files relating to upload:
rm -r /sysman/emd/state/* rm -r /sysman/emd/collection/* rm -r /sysman/emd/upload/* rm /sysman/emd/lastupld.xml rm /sysman/emd/agntstmp.txt rm /sysman/emd/blackouts.xml rm /sysman/emd/protocol.ini

emctl secure agent (this reloaded connection data into the emd.properties file)
emctl clearstate agent
emctl upload

Well, the upload failed, notice the note in section 2 about reloading connection data? Well, when the emctl secure agent command executes it goes out an gets the new connection data and password encryption and writes it into the file, so, since I figured that it was going to do this in the future I placed an entry in my /etc/hosts file to point to the proper IP address with the fully qualified host and domain. I figured I finally had it licked, but a emctl status command quickly disabused me of this fallacy.

A light went on over my head, this was a secure connection and I hadn’t exchanged certificates with the XP and Linux hosts. I pulled the URL from the REPOSITORY_URL setting and plugged it into a normal browser session, it pulled the proper certificate from the XP machine and placed it into my Linux boxes wallets. After that…viola! Finally, after several hours of looking things up on Metalink, Google and plumbing the depths of my own somewhat poorly organized mind, I had a working Grid Control with remote agents monitoring my RAC cluster.

Some notes you may find useful from Metalink:

Note: 362199.1 – Concerns setting the hostnames properly
Note: 330932.1 – Concerns setting the REPOSITORY_URL and Time zone properly

Have fun!


Oliver_Griffiths said...

Very useful blogg -Thanks!
These two Bug ID'S I can not find on Metalink and so assume that they have been depublished!?

Is there a definitive reference or book on the agents and the grid - shot and pithy as apposed to the documentation.

Oliver Griffiths.

Mike said...

No that I know of, I have a friend who has written one on OEM in general, I will ask her if she has covered the agents there in detail.