Mike Ault's Blog: May 2005

Sunday, May 29, 2005

Creating a Cross Tab Report

I see quite a few questions about creating a crosstab table. While working with a good friend, Gary Withrow, who works for Santa Cruz County, California, he showed me a great technique for doing crosstabs that he has given me permission to share. Essentially it involves using the capability of embedding expressions into the COUNT or other functions available in Oracle.

In 9i and above, of particular use in generating crosstabs is the CASE structure, for example:

-- Crosstab of owner and object_type
--
col owner format a10
set numwidth 8
set lines 132
set pages 50
select DECODE(GROUPING(a.owner), 1, 'All Owners',
a.owner) AS "Owner",
count(case when a.object_type = 'TABLE' then 1 else null end) "Tables",
count(case when a.object_type = 'INDEX' then 1 else null end) "Indexes",
count(case when a.object_type = 'PACKAGE' then 1 else null end) "Packages",
count(case when a.object_type = 'SEQUENCE' then 1 else null end) "Sequences",
count(case when a.object_type = 'TRIGGER' then 1 else null end) "Triggers",
count(case when a.object_type not in
('PACKAGE','TABLE','INDEX','SEQUENCE','TRIGGER') then 1 else null end) "Other",
count(case when 1 = 1 then 1 else null end) "Total"
from dba_objects a
group by rollup(a.owner)
/

The above SQL will generate a crosstab report on Object owner verses counts of the various type of objects they own as well as totals across all columns:

SQL Result

As you can see, this is a very useful technique.

In older versions of Oracle where the CASE is not available use DECODE instead:

select DECODE(GROUPING(a.owner), 1, 'All Owners',
a.owner) AS "Owner",
count(decode( a.object_type,'TABLE',1,null)) "Tables",
count(decode( a.object_type,'INDEX' ,1,null)) "Indexes",
count(decode( a.object_type,'PACKAGE',1,null)) "Packages",
count(decode( a.object_type,'SEQUENCE',1,null)) "Sequences",
count(decode( a.object_type,'TRIGGER',1,null)) "Triggers",
count(decode( a.object_type,'PACKAGE',null,'TABLE',null,'INDEX',null,'SEQUENCE',null,'TRIGGER',null, 1)) "Other",
count(1) "Total"
from dba_objects a
group by rollup(a.owner)
/

I have tested this in 8i, 9i and 10g and it works in all versions.

Thanks Gary for finding it and sharing it!

Wednesday, May 25, 2005

Using a HASH Signature in Data Warehouses

On my current assignment I am dealing with slowly changing dimensions, I found an interesting technique I would like to share.

In data warehouses one item that can be difficult to deal with are slowly changing dimensions. Slowly changing dimensions are dimensions whose values may alter over time such as addresses, deployments, etc. Often it is a desire of the data warehouse manager to maintain a running history of these changes over time, this makes the dimension a type 2 dimension. The reason these are difficult to deal with is because determining what has changed becomes difficult if there are more than a few fields in the dimension. The data warehouse manager has the choice of a difficult, field by field verification of each field in the record or using some other form of field verification to determine if a record has changed in a dimension thus requiring a new record insertion. When the new record is inserted into a slowly changing dimension the old record is marked as invalid either by a flag or via a date field or set of date fields allowing for a bi-temporal dimension.

One method for finding out if a record has a changed field is to generate a CRC or a hash for the entire pertinent sections of the record, store this signature with the record and then on each record that is to be inserted, a signature is calculated for the important sections of the record. This new signature is compared to the stored signature and if the signatures match, the record is tossed, if the signature is changed, the new record is inserted, the old is marked as invalid and the new marked as the current record, if the new record matches none of the existing signatures, the record is inserted but no existing records are altered.

However, calculating a CRC or HASH can be difficult, unless you know something about what Oracle provides in the various packages contained in Oracle. For example the OUTLN_EDIT_PKG contains the GENERATE_SIGNATURE package. The GENERATE_SIGNATURE package creates a 32 byte hash signature for up to a 32k-1 sized string value passed to it either internally or via a procedure call. If the pass-in is from a procedure call from SQL then the length would be limited to 4000 bytes. Providing your critical columns are less than 4000 bytes concatenated length you can generate a signature that can be used to compare records, in this scenario, each date or number field should be converted using the appropriate function to a character equivalent.

To use the OUTLN_EDIT_PKG.GENERATE_SIGNATURE package the first thing to do is make sure your user that owns the schema for the data warehouse (DWH) has EXECUTE privilege on the OUTLN_EDIT_PKG:

From the owner of the OUTLN_EDIT_PKG, the user SYS, grant the proper privilege:

GRANT EXECUTE ON outln_edit_pkg TO ;

In order to use the GENERATE_SIGNATURE package most effectively, I suggest wrapping it into a deterministic function, for example:

From the DWH user you could create a function similar to:

CREATE OR REPLACE FUNCTION get_sign (text VARCHAR2)
RETURN RAW
DETERMINISTIC
AS
signature RAW(32);
text2 VARCHAR2(32767);
BEGIN
text2:=text;
IF text2 IS NULL THEN
text2:='1';
END IF;
sys.outln_edit_pkg.generate_signature(text2,signature);
RETURN signature;
END;
/

Note that this is just a bare-bones function, exception checking should be added.

Notice I use a default value of '1' if the input value is null, because the function generates a hash signature you cannot pass it a null value, so in the case of a null value we change it to a 1, you of course could use any string value you desire, just be sure to document it!

Once you have a deterministic function you can either use it as an INSERT into a table column, or, use it to generate a function based index. By using a function based index and query re-write you can eliminate the need to have a column in the table. This allows you to regenerate the matching columns without having to update the complete dimension each time there may be columns added to the dimension. Also, there are performance benefits to using a function based index or an internal column over doing either a column by column match. Let's look at some examples:

First, we have our test table TEST, in this case I created it from the DBA_OBJECTS view using a simple CTAS.

SQL> DESC test
Name Null? Type
----------------------------------------- -------- -------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)

SQL> SELECT COUNT(*) FROM test;

COUNT(*)
----------
32359

Once we have our test table, we create our function to call the GENERATE_SIGNATURE and then use it to create a function based index:

SQL> CREATE INDEX test_fbi ON test(get_sign(ownerobject_namesubobject_name));

Index created.

Before we can use query rewrite we must analyze the table and index as well as set the proper initialization parameters, (query_rewrite_enabled = true and query_rewrite_integrity = trusted).

SQL> ANALYZE TABLE test COMPUTE STATISTICS;

Table analyzed.

Let's look at a query that will use this index and see what the performance is like.

SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = get_sign(a.ownera.object_namea.subobject_name);

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=324
Bytes=11016)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=324
Bytes=11016)
2 1 INDEX (RANGE SCAN) OF 'TEST_FBI' (NON-UNIQUE) (Cost=2
Card=129)

Statistics
----------------------------------------------------------
74 recursive calls
0 db block gets
14 consistent gets
1 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'TEST_FBI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'TEST_FBI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Now let’s drop the index and re-run our query to check performance without it:

SQL> DROP INDEX test_fbi;

Index dropped.

SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = get_sign(a.ownera.object_namea.subobject_name);

OBJECT_ID
----------
35567

Elapsed: 00:00:01.04

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=138 Card=324
Bytes=11016)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=138 Card=324
Bytes=11016)

Statistics
----------------------------------------------------------
243 recursive calls
0 db block gets
526 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:01.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=138 Card=324
Bytes=11016)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=138 Card=324
Bytes=11016)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
486 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = get_sign(a.ownera.object_namea.subobject_name);

OBJECT_ID
----------
35567

Elapsed: 00:00:01.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=132 Card=324
Bytes=11016)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=132 Card=324
Bytes=11016)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
446 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

As you can see the simple SELECT statement using the function based index performs in too small a time interval for Oracle's timing command to measure (the test was run several times, with identical results), while the SELECT not able to use the index takes a consistent time just over 1 second (the tests were run several times, and the times averaged).

As you can see even when the table is read from memory (physical reads = 0) the consistent gets (446) are much higher than those from the index based selects ( consistent gets 4) and the index based query performs better even though it has 74 recursive calls. The recursive calls were only present on the first run, after that the recursive calls were 0.

In situations where the call is repeated many times, such as for loading dimension tables, then the performance improvement should be dramatic.

In other tests, using an added column SIGNATURE, a RAW(32), and an index on the column we can get identical performance to a function based index and table combination:

First, using a modified SELECT to go against the SIGNATURE column and no index:

SQL> alter table test add signature raw(32);

Table altered.

SQL> update test a set a.signature=get_sign(a.ownera.object_namea.subobject_name);

32359 rows updated.

SQL> commit;

Commit complete.

SQL> set autotrace on
SQL> set timing on
SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = a.signature;

OBJECT_ID
----------
35567

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=116 Card=324
Bytes=7128)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=116 Card=324
Bytes=7128)

Statistics
----------------------------------------------------------
277 recursive calls
0 db block gets
534 consistent gets
3 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=116 Card=324
Bytes=7128)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=116 Card=324
Bytes=7128)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
486 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

So without an index the use of a special SIGNATURE column performs worse than the table and function based index combination. However, when we add an index, the performance is the same from the timing and the statistics.

SQL> create index test_ui on test(signature);

Index created.

Elapsed: 00:00:00.07
SQL> analyze table test compute statistics;

Table analyzed.

Elapsed: 00:00:02.05
SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = a.signature;

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=20)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=20)
2 1 INDEX (RANGE SCAN) OF 'TEST_UI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
26 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=20)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=20)
2 1 INDEX (RANGE SCAN) OF 'TEST_UI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=20)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=20)
2 1 INDEX (RANGE SCAN) OF 'TEST_UI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

However, if there is a chance that columns will be added or removed using a column inside the dimension will require the entire dimension to be updated a row at a time. In low row-count dimensions this may be acceptable, in large dimensions with thousands of rows it may be better to use the table and function based index solution.

Friday, May 20, 2005

It was the best of times...

I had a bit of a start the other day. I had a call from my brother in Florida telling me my Father was in the hospital for some "tests". Turns out his heart, which has been beating fiercely for nearly 70 years, has decided to take a rest every now and then. Probably he'll need a pace-maker, no big deal. No big deal, right, it gave me pause to think.

Over my life I've taken antibiotics, had inoculations (more than I care to remember courtesy of Uncle Sam), been in the hospital for stitches, an occasional kidney stone, etc. Even 80 years ago I might be dead instead of writing this. I know for a fact both my daughters and my wife, the central facets of my life, would be dead. Various infections, a blocked bowel requiring an emergency re-section, other things that we would take no notice of, polio, small pox, a plethora of deadlies would have stolen them from me, if not visa-versa.

People fantasize about simpler times. They can keep them. My Mother (god rest her) lost a brother, my Uncle, lost a son, my wife's paternal grandmother lost two children. Good old days, right. My father-in-law, a retired doctor, told the story of a patient who had come in complaining of shortness of breath, a check showed walking pneumonia, a quick prescription and poof! no more problem, 40 years earlier, it would have been a death sentence more than likely.

Perhaps we live in an age of information overload, where our lives are filled with stress and we run hither and yon after things our grandparents couldn't have conceived of. My Great grandmother, who passed away at 105 years of age, saw from wagon trains to space shuttles, all-in-all an incredible thing.

So many people seem to be longing for this age to end, whether it be through the "rapture" or whatever religious apocalyptic vision they have. They seem to fear how far we've come and how far we are likely to go. Many have forgotten history, how many-many times in the past the world has been a much more evil place than it is right now. Now we see the evil, in full color and sound, while before it could require generations for it to be seen.

I for one would not trade the current time for any I could think of in history. That we are standing on the brink of average life-times greater than 100 years is wonderful, maybe if we have longer lives we won't be so greedy with our time, maybe we will take the time to appreciate the good things instead of spending so much energy fleeing the bad. Maybe we will take the time to truly teach our children what they need to live, not just what they need to get by.

Anyway, just a few stray thoughts on this Friday in spring.

Wednesday, May 04, 2005

On the Demise of DBAs

Much is being said recently about how much the DBA job is changing. How soon there won't be a real need for hard-core DBAs. In a recent keynote (I was unfortunately called to a client site and it was given in my stead by David Scott, the President of the GOUSER (Georgia Oracle User) group) I addressed this issue. In an analysis of the changing DBA role I was able to identify that we have gone from the basic 13 items listed in the Version 6 Oracle Administrators guide for a description of the DBA job, to at least 2 dozen responsibilities as of 10g. The original list looked like so:

1. Installing and upgrading Oracle server and application tools.
2. Allocating system storage and planning future storage requirements for the database.
3. Creating primary database storage structures once developers have designed an application.
4. Creating primary database objects (tables, views, indexes) once the application designers have designed an application.
5. Modifying the database structure as necessary, from information given by application developers.
6. Enrolling users and maintaining system security.
7. Ensuring compliance with Oracle licensing agreements.
8. Controlling and monitoring user access to the database.
9. Monitoring and optimizing the performance of the database.
10. Planning for backup and recovery of database information.
11. Maintaining archived data on appropriate storage devices.
12. Backing up and restoring the database.
13. Contacting Oracle Corporation for technical support

The revised list:

1. Installing and upgrading Oracle server and application tools.
2. Allocating system storage and planning future storage requirements for the database.
3. Creating primary database storage structures once developers have designed an application.
4. Creating primary database objects (tables, views, indexes) once the application designers have designed an application.
5. Modifying the database structure as necessary, from information given by application developers.
6. Enrolling users and maintaining system security.
7. Ensuring compliance with Oracle licensing agreements.
8. Controlling and monitoring user access to the database.
9. Planning for backup and recovery of database information.
10. Maintaining archived data on appropriate storage devices.
11. Contacting Oracle Corporation for technical support.
12. Management of object related features.
13. Determination of LOB storage options.
14. Assistance with RAID configuration.
15. Determination of proper index strategy (normal, reverse, IOT, bitmapped)
16. Education of Developers in Oracle features and their use.
17. Management of distributed environments.
19. Management of parallel server and parallel query.
20. Determine and manage partitions and sub-partitions.
21. Determine proper use of outlines and SQL profiles
22. Create, manage and maintain resource groups,
23. Create manage and maintain global temporary tables.
24. Create and manage materialized views, summaries and snapshots.
25. Monitoring and managing the automatic and dynamic sizing parameters.
26. Monitoring and managing the automated UNDO (it isn’t set and forget)
27. Monitoring and tuning RAC environments, especially the cluster interconnect.
28. Manage and maintain fine grained auditing (HIPPA/SOX requirements)
29. Manage and maintain row level security28. Manage and maintain fine grained access controls.

Add to the above: Monitor and maintain application servers, web servers, connection managers, LDAP and other servers as well as the entire client to database environment in many shops the DBA does it all.

Now there may be arguments on both sides about some of the above rolling into some of the original listed general categories, but when some commands require multiple pages to just describe (CREATE TABLE for example), let alone show meaningful examples for, well, they need to be broken out as a responsibility.

However, as the features are improved I have no doubt they will automate the complete management of SQL, tables and indexes and tablespaces as well as some memory and tuning parameters. So the DBA will give up on items 12, 13, 24 and 25. Gee, how will the other 25 (26 if you include the final one added above) items fill our time? The death of the DBA has been greatly overstated.

A case in point. I am involved in producing some roll ups for use in feeding the Oracle Warehouse builder for a 10g DWH project. The roll ups are used to pe-flatten and tune the feeds, tuning is difficult if you use 100% OWB to do the flattening. Anyway during verification that the flattening was being effective I decided to test the Enterprise Manager SQL Tuning Advisor. Before tuning, the particular view being examined required 10 minutes to produce a count of its records, I put the SQL Tuning Advisor onto it and 8 minutes later it came up with its recommendations. I implemented them and then tried the count again. 39 minutes later it came back with the count. It didn't exactly produce the desired performance improvement. To be fair, on the one I ran through before this, it cut a 4 minute run to a 2 minute run on a different view. Could I have done a better job tunng it manually? We shall see (as soon as I figure out how to get the "SQL Profile" it created removed!)

So, needless to say, I feel safe in saying the imminent demise of the DBA though much sought after by Oracle Corporate (I kid you not, I was at a ECO conference and sat through a keynote by a VP of Oracle who obviously didn't realize he was talking to a room full of DBAS, which with great relish described how Oracle was attempting to do away with the job of DBA) is still a long way off. It seems for each job they "take away" they give us three more.

Mike Ault's Blog