Mike Ault's thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the BLOG or are insulting or demeaning to anyone. The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer.
Wednesday, January 07, 2009
Database Entomology in Oracle11g Land
As a pre-test I had created a small TPC-H (not more than 30 gigabytes) and on a single-server rig I could run the complete 22 query TPC-H set of queries in parallel query. Of course I wasn’t using high levels of partitioning and parallel query simultaneously with the small data set.
I knew it would be interesting when I got to query 9 on the 300 gigabyte dataset and on query 9 I got an ORA-00600:
ORA-00600: internal error code, arguments [kxfrGraDistNum3],,
When I ran an 8 stream randomized order query run I also periodically received:
ORA-12801: error signaled in parallel query server P009, instance dpe-d70:TMSTPCH3 (3)
ORA-00600: internal error code, arguments: [qks3tStrStats4], , , , , , , 
On query 18 once in a while but not every run. Due to not being a full customer (only having a partner CSI number) I was unable to report these as possible bugs. I did completely check out OTN and Metalink as well as Google and no one else seems to be having these issues, of course how many folk are running Oracle11g 126.96.36.199 or 188.8.131.52 with RAC, cross instance parallel query and heavy partitioning and sub-partitioning?
I had a quick look at the Oracle11g 184.108.40.206 release notes and saw a load of bug fixes and hoped mine were covered, even though a text search didn’t show up the ORA-00600 arguments I received. So I bit the Oracle bullet and performed an upgrade.
Well, actually 2 sets of upgrades. First I upgraded my home, 32-bit (this is important later) servers and other than the usual documentation gotchas and needing to set the database as exclusive, start it, stop it and then reset it as a cluster before the dbua program would run properly, I was successful and now have an Oracle11g 220.127.116.11 instance running on my home RAC setup. A quick run against the 30 GB database showing no really stellar improvements against my test setup using JBOD arrays for TPC-H and it successfully ran Query 9 against the non-partitioned, smaller 30 gigabyte data set.
Feeling pleased with the success I immediately set out the next morning to update my large test environment, my 64 bit cluster. The CRS update went smoothly, and other than some space issues (you may want to add a datafile to your SYSTEM tablespace) and some package problems (for some reason DBMS_SQLTUNE and DBMS_ADVISOR where missing) the database upgrade went fine, right up to the point of starting the instances under 18.104.22.168. It seems there is just a small bug with the use of the new MEMORY_TARGET parameter and release 22.214.171.124…you can’t go above 3 gigabytes! This is why I said that the upgrade on 32 bits was important to remember, in 32 bit systems you will rarely get above a 3 gigabyte SGA size once you allow for user logins, process space and operating system memory needs. However, one of the major reasons for going to 64 bit is to have SGA sizes in Oracle greater than 4 gigabytes. Now, if you go back to using the SGA_MAX_SIZE and SGA_TARGET or the full manual specifications such as SHARED_POOL_SIZE and DB_CACHE_SIZE you can get above the 3 gigabyte setting.
Another annoying thing with 11g and the MEMORY_MAX_SIZE setting is that you cannot exceed MEMORY_MAX_SIZE with the sum of your SGA settings plus PGA_AGGREGATE_TARGET. Now for those of you with small sort sizes this isn’t really a problem and you probably won’t have any issues. However, with a TPC-H you need a large sort size so you need a large PGA_AGGREGATE_TARGET but, you quickly get into trouble with the limits on MEMORY_MAX_SIZE and large PGA_AGGREGATE_TARGETS. With my 16 gigabytes of memory per server I was only able to allocate about 8 gigabytes to Oracle (actually about 7.5 in 126.96.36.199 and 3 in 188.8.131.52) anything larger and I would get errors. So needless to say, I turned off the total memory management and did it the old fashioned way.
Finally, I had my instances up, about a 7 gigabyte SGA with 5 gigabytes of DB_CACHE_SIZE and 5.5 gigabytes of PGA_AGGREGATE_TARGET. Oh, did I mention, your SHARED_POOL_SIZE must be at least 600 megabytes for Oracle11g 184.108.40.206? If it isn’t you will get 4030 errors on startup, I ended up with 750 megabytes worth.
So after 8 hours of upgrade time for my main set of instances I was finally ready to run a TPC-H. Guess what, with the upgrade in place, more cache and bigger sort areas, I seem to be getting worse performance than with the sub-optimal query resolving, bug-ridden 220.127.116.11 version. Looks like they fed the bugs instead of killed them. Oh well, back to the tuning bench.