I had to explain to a fellow DBA the impact of null on processing where the columns that where null were also used in joining of tables. It took a while, but finally he caught on. It made me wonder how many folks are confused by this. The major issue is that nulls are not tracked in normal indexes. It is true that while bitmap indexes may be used to track null values, they have their own issues for OLTP environments and usually have very restricted use in OLTP environments.
I decided to develop a little test case just to show what is going on with NULLs. Let’s take a look at this test case:
SQL> create table test as select * from dba_objects;
Table created.
SQL> desc test
Name Null? Type
----------------------------------------- -------- ----------------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL> create index test_ind on test(object_id);
Index created.
SQL> analyze table test compute statistics;
Table analyzed.
SQL> set autotrace on
SQL> select count(*) from test where object_id is not null;
COUNT(*)
----------
33732
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (FAST FULL SCAN) OF 'TEST_IND' (NON-UNIQUE) (Cost=
19 Card=33732 Bytes=134928)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
79 consistent gets
0 physical reads
0 redo size
381 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from test where object_id is null;
COUNT(*)
----------
11
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=104 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'TEST' (Cost=104 Card=11 Bytes=44
)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
464 consistent gets
0 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> create index fbi_test on test(nvl(object_id,-1));
Index created.
SQL> analyze table test compute statistics;
Table analyzed.
SQL> select count(*) from test where nvl(object_id,-1)=-1;
COUNT(*)
----------
11
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'FBI_TEST' (NON-UNIQUE) (Cost=2 Ca
rd=12 Bytes=48)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> update test set object_id=-1 where object_id is null;
11 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE (Cost=104 Card=11 Bytes=44
)
1 0 UPDATE OF 'TEST'
2 1 TABLE ACCESS (FULL) OF 'TEST' (Cost=104 Card=11 Bytes=44
)
Statistics
----------------------------------------------------------
2 recursive calls
78 db block gets
465 consistent gets
0 physical reads
9208 redo size
625 bytes sent via SQL*Net to client
548 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
11 rows processed
SQL> commit;
Commit complete.
SQL> select count(*) from test where object_id=-1;
COUNT(*)
----------
11
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'TEST_IND' (NON-UNIQUE) (Cost=2 Ca
rd=1 Bytes=4)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
1. We create a test table with nulls in the column OBJECT_ID
2. We then created a normal index on the OBJECT_ID column
3. Using IS NULL and IS NOT NULL based WHERE clauses we determined:
a. The IS NOT NULL type WHERE clause will use an index
b. The IS NULL type WHERE doesn’t use an index
4. Next we create a function base index using the NVL function to convert the NULL values into a default value.
When we use the NVL function in our SELECT in place of the IS NULL we get index usage and much better performance, however, this may lead to improper relationships between tables.
5. As a final step, we replace the NULL values in TEST with a default value of -1, now using that in our select instead of IS NULL we get response virtually identical with the use of NVL function based index.
Of course the issues we are trying to prevent with all this is processing of IS NULL type queries forcing such items as full table scans and causing developers to have to use OUTER JOIN type syntax to resolve NULL conditions between tables. By eliminating these OUTER JOIN and IS NULL processing steps which result in large HASH and full table scans in many cases, we can dramatically reduce the execution times of queries. Notice in the example how we reduced cost figures from 104 for a full table scan to 2 for an index (either our normal or the function based index) scan.
So, what should we take from this? Several conclusions can be reached:
- Whenever possible avoid the use of NULLable columns for columns that will be joined.
- Use default values in place of NULL, note that to avoid outer joins, both parent and child tables must have a value, for example if we had a parent table the related to the OBJECT_ID in our example, we would have an entry of (-1, ‘NO ID’) in that parent to resolve the -1 entries in the child.
- If IS NULL is required, consider replacing it with a NVL call and a function based index, while this may not help with outer joins, it will help with IS NULL type selects.
Have fun!
Mike
 

1 comment:
Sorry, these foreign languages (English...french...) can confuse one...:)
Mike
Post a Comment