Mike Ault's thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the BLOG or are insulting or demeaning to anyone. The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer.
Wednesday, April 27, 2005
My thought was to use varray types and populate them with the various possible values, then use dbms_random.value to generate the index values for the various varrays. The count column was just a truncated call to dbms_random in the range of 1 to 600. All of this was of course placed into a procedure with the ability to give it the number of required values.
create or replace procedure load_random_data(cnt in number) as
define varray types;
define varrays using types;
declare loop interator;
initialize varrays with allowed values;
start loop 1 to cnt times
set dbms_rabdom seed to loop interator;
insert using calls to dbms_random.value(1-n) as the indice for the various varrays;
This generated the rows and then using a primary key constraint with the exceptions clause I would then be able to delete the duplicates leaving my good rows.
create exceptions table using $ORACLE_HOME/rdbms/admin/utlexcpt.sql
alter table tab_z add constraint primary key (x,y,z) exceptions into exceptions;
delete tab_z where rowid in (select row_id from exceptions);
Well, the procedure worked as planned. However, when I attempted to use the exceptions clause on a concatenated key it did not work after the first run where it left 99 out of 600 rows it ended up rejecting all of the rows in the table on all subsequent runs! I tried it several times, each time it rejected every row in the table, even though I could clearly see by examining the first hundred or so rows that they weren't all duplicates.
The left me the old fallback of using the selection of the rowid values and then deleteing the min(rowid) values:
delete from tab_z Z
where z.rowid > (
select min(z2.rowid) from tab_z Z2
where Z.X = Z2.X and
This worked as expected. Of course you have to generate X(n1)*Z(n2)*Y(n3)*tab(n4) rows it seems to get the n4 number of rows where n1-3 are the count of the number of unique values for the given column and n4 is the number of rows in the table. For example, for my 600 rows I ended up needing to generate 30000 rows to get 600 non-dupes on the three index values. In the 30000 rows I ended up with 600 sets of duplicates in the range of 9-80 duplicates per entry. I re-seeded the random function using the loop interator with each pass through the loop.
Of course after all the work with dbms_random I figured out that it would have been easier just to cycle through the possible values for the varrays in a nested looping structure. Of course that would have led to a non-random ordered table. For the work we were doing it would have been ok to do it that way, but I can see other situations where the random technique would be beneficial.