Mike Ault's thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the BLOG or are insulting or demeaning to anyone. The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer.

Tuesday, October 11, 2011

Minimizing Rebuild Time with Benchmark Factory from Quest

In my job as Oracle Guru for Teas memory Systems I perform a multitude of bechmarks to showcase the performance of our SSD products. One tool I have found invaluable while doing these benchmark tests is the Benchmark Factory from Quest Software, Inc. as it allows TPC-C, TPC-E and TPC-H as well as custom and other recognized benchmarks to be easily performed and provides decent reporting of the results. However, there is one fly in the ointment.

Benchmark factory does great when the benchmark size is small, however, its time to reload the database grows exponentially as the size of the database increases, for example, the creation of a 3000 warehouse TPC-C takes over 4 hours using a 16-CPU server with plenty of memory and SSD for storage, I can only imagine how long it would take with smaller servers using normal hard drive based SAN or JBOD storage. So, how can this obstacle be overcome?

First, I had Quest provide some parallel load scripts which allow me to load all 9 tables independently. Using the standard load utility it loaded them serially and any errors resulted in having to restart the load process from the beginning. With the independent load scripts you can load any combination of the tables, restarting any that may fail due to space or undo problems as needed. These are the load scripts that reduced the time to 4 hours mentioned above. They are available on the Quest website. However, spending 4 hours to refresh a database to perform a 45 minute test seems rather much, is there anything else that can be done to reduce this time further?

The http://www.tpc.org website has the dbgen utility that can be used to create load tables, then you create a set of SQL Loader files to map them into the database and you can load in parallel, this reduces the load time to about an hour, you can also map the external files as external tables. Of course, rather than deal with learning a second utility (which you have to download and link as a C routine) I just loaded the tables using BMF then before messing with them, created a set of external table using simple CTAS and the external table clauses as required. Now my load time is down to about an hour. I could further reduce the load time by parallelizing specific table loads. Note that to use the ORACLE_DATAPUMP as a source to insert from you must set the FILESYSTEMIO_OPTIONS parameter to NONE. Of course what about statistics? Statistics on multi-million row tables can take a couple of hours to generate as well.

Command to create a set of external table build scripts:
(you must use the “create directory” command to create the ext_tab directory entry first)

select 'create table ext_'||table_name||'
organization external (
type oracle_datapump
default directory ext_tab
location ('||chr(39)||'ext_'||table_name||'.ext'||chr(39)||')) AS
select * from '||table_name||';' from user_tables
/

Example result script:

create table ext_C_HISTORY
organization external (
type oracle_datapump
default directory ext_tab
location ('ext_C_HISTORY.ext')) AS
select * from C_HISTORY;

Following the data load and index build I did a statistics collection using DBMS_STATS.GATHER_SCHEMA_STATS then I stored the statistics into a statistics export table. Since I have the statistics stored it is simply a matter of importing the statistics at the conclusion of the reload.

Commands to export statistics:

EXEC DBMS_STATS.CREATE_STAT_TABLE('TPCC', 'STATS_TABLE');
EXEC DBMS_STATS.export_schema_stats('TPCC','STATS_TABLE',NULL,'TPCC');

Command to import statistics:

EXEC DBMS_STATS.import_schema_stats('TPCC','STATS_TABLE',NULL,'TPCC');

So now my data reload simply becomes a set of truncate table commands followed by an insert with the APPEND hint using a SELECT * from the external tables followed by a statistics import. This cuts the turn around time from around 6-8 hours to less than 2 for a 3000 warehouse TPC-C benchmark.

Example INSERT command:

insert /*+append*/ into c_customer select * from ext_c_customer;

1 comment:

Joel Garry said...

Oh Mike, you're such a Teas!
(look at the first sentence of your post)