Mike Ault's thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the BLOG or are insulting or demeaning to anyone. The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer.

Wednesday, November 30, 2005

Becoming Non-Dimensional

No, I am not talking about switching into some star-trek universe, however, this may help deliver warp-speed performance!

Dimensionless Star Schema (Bitmap Star)

Introduction

In standard data warehousing we are taught to utilize a central fact table surrounded by multiple dimension tables. The dimension tables contain, for lack of a better description, report headers and maybe a description, usually they are very lean tables. In Oracle9i and later these dimensions are then related to the fact table by means of bitmap indexes on the foreign key columns in the fact table.

Examination of a Star Schema

So in closer examination we have three major components in a star:

  1. Central Fact table
  2. Dimension tables
  3. Bitmap indexes on foreign keys


Note that in Oracle9i and later the actual foreign keys do not have to be declared, but the bitmap indexes must be present.

So when a star is searched the outer dimension tables are first scanned then the bitmaps are merged based on the results from the scan of the dimension tables and the fact table is then sliced and diced based on the bitmap merges. Generally the dimension tables have little more than the lookup value, maybe a description, maybe a count or sum. In most cases the measures stored in the dimension tables can be easily derived, especially for counts and sums.

This all points to an interesting thought, in the case where the dimension table actually adds nothing to the data available, but merely serves as a scan table for values stored in the database, why not eliminate it all together and simply create the bitmap index on the fact table as if it did exist? Many dimension tables can be eliminated in this fashion.

A New Star is Born

Eliminating most, if not all of a fact tables dimension tables but leaving the bitmap indexes in place on what were the foreign keys leaves us with what I call a bitmap star. By eliminating the fact tables we reduce our maintenance and storage requirements and may actually improve the performance of our queries that now simply do a bitmap merge operation to resolve the query, eliminating many un-needed table and index operations from the now defunct dimensions.

Essentially you are flattening the structure into a potentially sparse fact table that is heavily bitmap indexed. The bitmap indexes become the “dimensions” in this new structure.

Caveat Emptor!

However, this structure is not suitable for all star schemas but can be applied in a limited fashion to many where the duplicitous storage of values in both the fact and dimension tables occurs. The applicability of this structure has to be determined on a case by case basis and is not a suggested one-size-fits-all solution for all data warehouses.

If I get some spare time I will create some test cases to try out this new structure in comparison to a traditional star schema. In tests of the bitmap star we achieved sub-second response time utilizing Oracle Discoverer against a 2.5 million row bitmap star table built on a 7-disk RAID5 array. The fact table had 6 bitmap indexes and one 5-column primary key index. Only a single “normal” dimension table was required due to a needed additional breakout of values on one of the columns in the fact table.

2 comments:

David Aldridge said...

Mike, I think that much of the value of this derives from avoiding the hash join back to the fact table. I don't think that the initial operation of the star transformation, something like:

select ...
from my_fct ...
where my_fct.fk =
(select pk from my_dim where pk in ('A',...,'X') is too onerous.

Actually there are some tools that will avoid this join automatically (Microsoft) or through user configuration (Business Objects), which can give us the best of both worlds:

i) avoiding the expensive join
ii) also providing a selectable list-of-values (generally required except where the data set is something like "all dates in a known range".

This is alos similar to the concept of "degenrate dimensions" where a dimensional value adds no information that cannot be derived from other dimensional values (for example where a transaction id is defined by the concatanation of a transaction date, a location code, and a product code for example)

Mike said...

I quite agree. Another savings is in the maintenance of the system. With the bitmap star (b-star?) now all you have to do is update the base and rebuild the bitmaps.