Mike Ault's thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the BLOG or are insulting or demeaning to anyone. The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer.

Monday, March 27, 2006

Further 1=2 Work

Added 4/4/2006 - Did a bit more probing via the 10gR2 explain plan tool. What Oracle actually does with the 1=2 is replace it with the predicate NULL IS NOT NULL. Simply use this instead of the 1=2 to eliminate some parsing and confusion in your code. This also seems to indicate you can replace the 1=1 with NULL IS NULL. Anyway, it's been some interesting research.

Finally having some spare time I decided to perform a few more tests on the 1=2 situation. In actuality it seems to be another case of Oracle explain plans not actually matching what is happening in the database.

Observe the following:

SQL> create table test as select * from dba_objects;

Table created.

SQL> select count(*) from test;

COUNT(*)
----------
50115

SQL> create index test_idx on test(object_id);

Index created.

SQL> analyze table test compute statistics;

Table analyzed.

SQL> set autotrace on
SQL> select count(object_id) from test where 1=2;

COUNT(OBJECT_ID)
----------------
0


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=23 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 INDEX (FAST FULL SCAN) OF 'TEST_IDX' (INDEX) (Cost=23
Card=50115 Bytes=200460)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
310 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

If we base our decision on cost, we see a cost of 23, but the true cost is in the statistics where it required only one recursive call, no db gets, consistent gets or physical reads. Now, if we flush the cache and shared pool we see the following:

SQL> alter system flush buffer_cache;

System altered.

SQL> alter system flush shared_pool;

System altered.

SQL> select count(object_id) from test where 1=2;

COUNT(OBJECT_ID)
----------------
0


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=23 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 INDEX (FAST FULL SCAN) OF 'TEST_IDX' (INDEX) (Cost=23
Card=50115 Bytes=200460)

Statistics
----------------------------------------------------------
297 recursive calls
0 db block gets
43 consistent gets
7 physical reads
0 redo size
310 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed

So in adding parsing to the cost, we see that the recursive calls, consistent gets and physical reads do play a part, but only for the first execution. Now, dropping the index we see:

SQL> drop index test_idx;

Index dropped.

SQL> select count(object_id) from test where 1=2;

COUNT(OBJECT_ID)
----------------
0


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=138 Card=1 Bytes=4
)

1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'TEST' (TABLE) (Cost=138 Card=5
0115 Bytes=200460)


Statistics
----------------------------------------------------------
169 recursive calls
0 db block gets
18 consistent gets
0 physical reads
0 redo size
310 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
1 rows processed

And now with cache and pool populated:

SQL> select count(object_id) from test where 1=2;

COUNT(OBJECT_ID)
----------------
0


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=138 Card=1 Bytes=4
)

1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'TEST' (TABLE) (Cost=138 Card=5
0115 Bytes=200460)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
310 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Again we see that while the cost figure is several times higher than with the index, the true cost in recursive calls, db block gets and physical reads is the same as with the index.

Now what about with a comparison to an impossible column value? Let’s see:

SQL> create index test_idx on test(object_id);

Index created.

SQL> analyze table test compute statistics;

Table analyzed.

SQL> alter system flush shared_pool;

System altered.

SQL> alter system flush buffer_cache;

System altered.

SQL> select count(object_id) from test where object_id=0;

COUNT(OBJECT_ID)
----------------
0

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'TEST_IDX' (INDEX) (Cost=1 Card=1
Bytes=4)

Statistics
----------------------------------------------------------
294 recursive calls
0 db block gets
44 consistent gets
18 physical reads
0 redo size
310 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed

With parsing we see the cost in recursive calls, db block gets and physical reads is very similar to the costs using 1=2, but how about once the statement is loaded and the buffer cache filled? Let’s look:

SQL> /

COUNT(OBJECT_ID)
----------------
0

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'TEST_IDX' (INDEX) (Cost=1 Card=1
Bytes=4)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
310 bytes sent via SQL*Net to client
496 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

As you can see, the stated cost is much less, but the cost in consistent gets is 2. In timing, both statements showed a response time average of 0.05 seconds.

Conclusions

Oracle explain plans don’t always match what the database is doing, and costs can be next to useless in many situations always look at the underlying statistics to see what is happening. Always test in your own environment to ensure stated Oracle behavior from one OS to another remains constant.

As far as whether using 1=2 or using a comparison to an index column with a value that can’t exist is more efficient, using 1=2 is marginally better with 0 consistent gets verses 2 consistent gets however in performance tests the two statements performances where virtually indistinguishable.

Thursday, March 23, 2006

WHERE 1=2 and Parallel Isn't

In another blog out there in blogspace a fellow expert has called me on this tip. I can only report what we saw during our testing. First we saw a reduction in full table scans as indicated in the V$SQL_PLAN with a reduction in the indicated bytes processed. Allowing that this may be just an artifact of Oracle reporting it would use a FTS if it actually tried to resolve the query but really used predicate elimination to realize there was nothing to do, how does one expalin away the drastic reduction in physical IO and logical IO we noticed in enterprise manager screens after the changes? Anyway, don't take my word for it, no one has the absolute last word on anything Oracle in less they are peering at the souce code for your release on your OS while making their pronouncements. Test the results on your machine and for your version and go with what works in your environment. It's just that simple.

Found out some interesting things yesterday. First, while reviewing a clients Oracle application which had been ported from SQL Server, I noticed they were using the constructs “WHERE 1=1” and “WHERE 1=2” in order to build dynamic SQL for dynamic queries. In the first case of “WHERE 1=1” it was always followed by a group of one or more “AND” clauses so the optimizer would of course practice predicate elimination and remove it. However in the second case, “WHERE 1=2” it was being used to merely return the column headers to avoid error handling.

It is this second case, “WHERE 1=2” I want to talk about. Of course since 1 never equals 2 you never get a result back, however it forces a full table scan! In testing with 10gR1 version 10.1.0.4 and on 9iR2, 9.2.0.4 if there was no bitmap index on the table using the “WHERE 1=2” construct resulted in a full table scan. I tried using a unique and a non-unique B-Tree and they were ignored, seems only a bitmap index would allow the optimizer to determine that 1 would never equal 2, why this is so is a mystery since obviously the bitmap had nothing to do with whether 1=1 or 1=2.

In SQL Server, the optimizer is smart enough to realize 1 will never equal 2 and there, it simply returns the column list without any other action. Oracle, you paying attention? Make a note!

What we ended up doing in this clients case was to replace the “WHERE 1=2” with a clause that equated the primary key of the table with an impossible value for that key, in this case the ID was being passed in as a GUID (a hexadecimal value) so we use a “WHERE KEY=HEX(00)” and got a low cost unique index lookup instead of a costly full table scan.

A second item of note, one of the tables in a SQL involved in a match against values passed into a global table temporary table had a PARALLEL setting of 4, but the instance had a parallel min servers setting of 0 so there was no way that a parallel scan could be performed. However, the fun part in this situation comes up when you get an explain plan. Using the command line explain plan you get a perfectly reasonable plan using an index scan on the permanent table and a full scan on the global temp. However, when you check on this SQL in the V$SQL_PLAN table it is attempted a parallel query with a single query slave doing a full table scan on both the permanent and temporary table!

The proper plan for this non-parallel query being forced into a non-optimal parallel plan was also shown in the EM database control session SQL screen (since it is probably pulled from the DBA_HIST_SQL_PLAN which is a child of the V$SQL_PLAN table, this makes sense.) However, the developer, using the PL/SQL Developer tool, was getting the bad explain plan and couldn’t see the problem. We changed the PARALLEL to 1 on the table and the plans then matched between the online explain plan and the V$SQL_PLAN explain plan.

A second note for Oracle: If the PARALLEL is set on a table and the MIN_PARALLEL_SERVERS is set to 0, completely disregard the PARALLEL setting. In testing a setting of 0 resulted in no P000 processes starting so even with the resulting parallel plan the query was serialized.

And a final note: The plan generated by the EXPLAIN PLAN and AUTOTRACE commands should match the one generated into the V$SQL_PLAN table.

Mike

Friday, March 03, 2006

They AMM to Please...Sometimes

Just an insteresting addition to this (3/16/2006) was at another client site where we saw this behavior on a 10gR2 Enterprise release of RAC as well. - Mike

Ran into another undocumented feature in 10gR2 Standard edition using RAC today. On a RedHat 4.0 4-CPU Opteron (2-Chip, 4-core) using 6 gigabytes of memory in a 2-node RAC, the client kept getting ORA-07445’s when their user load exceeded 60 users per node. At 100 users per node they were getting these errors, a coredump for each and a trace file on each server, for each node, about twice per minute. There didn’t seem to be any operational errors associated with it, but it seriously affected IO rates to the SAN and filled up the UDUMP and BDUMP areas quickly. Of course when the BDUMP area filled up the database tends to choke.

The client is using AMM with SGA_TARGET and SGA_MAX_SIZE set and no hard settings for the cache or shared pool sizes. Initially we filed an ITar or SR or whatever they are calling them these days but didn’t get much response on it. So the client suffered until I could get on site and do some looking.

I looked at memory foot print, CPU foot print and user logins and compared them to the incident levels of the ORA-07445. There was a clear correlation to the number of users and memory usage. Remembering that the resize operations are recorded I then looked in the GV$SGA_RESIZE_OPS DPV and then correlated the various memory operations to the incidences of the ORA-07445, the errors only seemed to occur when a shrink occurred in the shared pool as we saw the error on node 1 where a shrink occurred and none on node 2 where no shrink had happened yet.

Sure enough, hard setting the SHARED_POOL_SIZE to a minimum value delayed the error so that it didn’t start occurring until the pool extended above the minimum then shrank back to it, however, not every time. We were able to boost the number of users to 80 before the error started occurring by hard setting the shared pool to 250 megabytes. A further boost to the shared pool size to 300 megabytes seems to have corrected the issue so far but we will have to monitor this as the number of user processes increases. Note that you need to look at the GV$SGA_RESIZE_OPS DPV to see what resize operations are occurring and the peak size reached to find the correct setting on your system.

It appears that there must some internal list of HASH values that is not being cleaned up when the shared pool is shrunk. This results in the kernel expecting to find a piece of code at a particular address, looking for it and not finding it, this generates the ORA-07445. Of course this is just speculation on my part.

So for you folks using 10gR2 Standard edition with RAC (not sure if it happens with Non-RAC, non-Standard) look at either not using AMM, or be sure to hard set the SHARED_POOL_SIZE to a value that can service your number of expected users and their code and dictionary needs.