Mike Ault's thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the BLOG or are insulting or demeaning to anyone. The posts on this blog are provided “as is” with no warranties and confer no rights. The opinions expressed on this site are mine and mine alone, and do not necessarily represent those of my employer.

Tuesday, February 17, 2009


Well, I am on my way home from the Rocky Mountain Oracle Users Group Trainings Days event. I presented a paper titled “Is Oracle Tuning Obsolete,” a copy of which can be found on the http://www.rmoug.org/ site or at http://www.superssd.com/. While I was there I attended two presentations on the Oracle/HP Exadata Database Machine, one by Kevin Closson and another by Tom Kyte, both of Oracle.

My only complaint about both presentations was that when they presented the user test results they neglected to show the full (or even partial) configurations of the servers and disk systems they had tested against. Rather like saying my car is 10 times faster than Joe’s and telling you mine is a 1995 Dodge Avenger and failing to mention Joe’s is a Stanley Steamer. Be that as it may, I still enjoyed the presentations and the best take away was from Kevin’s presentation when he said that “If your current system is fully tuned, has adequate disk resources, and is performing well, the Exadata has nothing to offer you.” An example from kevin would be a 128 CPU Superdome with 128 4GFC HBAs that were being fed by ample XP storage as that would be 51GB/s ingest-capable. Also during Tom’s presentation he admitted the primary target of the Exadata was those shops with row-after-row of Oracle servers followed by a single Netezza or Teradata server or servers.

Essentially the Exadata Database Machine is targeted at the larger (several terabytes) data warehouse that would otherwise be placed on a Netezza or Teradata machine and I couldn’t agree more. However, it would be a fun test to replace the disks in an Exadata cell with a RamSan-500 and see what (if any) additional performance could be gained. After all, the disks are still the limiting factor in the performance of the system. For example, a single Exadata cell tops out at around 2,700 IOPS, according to white papers on the Oracle site; a single RamSan-500 can sustain 100,000 mixed read/write IOPS and 25,000 pure write IOPS with minimal response times. As far as I can tell, no additional smarts are built into the Exadata disk drives in the place of special firmware, such as is supposedly done with EMC systems, so replacing the drives with a single RamSan-500, either set up as 12 LUNs, or as a single large LUN, should be easy.

Another interesting discussion I had during this time frame was with our (Texas Memory Systems) own Matt Key, one of our Storage Applications Engineers, about why adding the Enterprise Flash Drives (EFDs) to arrays produces little if any benefit for large levels of writes. Turns out there is an upward limit on the bandwidth a single disk tray can handle and with the EFDs instead of disk drives the disk tray tops out at around 3000 (between 1600 and 3200) or so IOPS (based on a 64K stripe) so you actually need several trays (with a max of only 4 drives to a tray because of other limits) to get significant write IOPS. For comparison, the RamSan-500 can handle 25,000 sustained write IOPS. Now don’t get me wrong, the EFDs can improve the performance of certain types of loads when compared to a standard array with no EFDs, but if you are write-heavy you may wish to consider other technologies. Note: The calculations are based on a 200 megabyte/second FC-AL bandwidth with 64K writes, since RAID6 is used there are 2-64K writes for each write, 200MBS/64K=3200 IOPS, 200MBS/128K=1600 IOPS. These limitations apply to all array-based EFDs.

The RamSan-500 makes an excellent complement to any enterprise array, especially if you use the preferred read technology to read from the RamSan-500 while writing to both, for example, when you are using array-based replication, such as SRDF, to provide geo-mirroring of the frame to a remote site. By offloading the reads, the number of writes that can be supported by the array can be increased as a factor of the percent of reads in the work load, thus increasing the performance of the entire system. As an example, if you have an 80/20 read/write workload and you offload the 80 percent of reads to the RamSan, this frees up the array to handle a factor of 4 more writes, up to the actual maximum IOPS of the array. This is a 4X increase in I/O with 0-impact to infrastructure or BCVs.

Oh, on February 24-25 I’ll be in Charlotte, NC presenting at the Southeast Oracle Users Convention (SEOUC). My two presentations are: “My Ideal Data Warehouse System” and “Going Solid: Use of Tier Zero Storage in Oracle Databases.” I hope I see you there!

As I digest more of the information I obtained this week, I will try to write more blog entries. So for now I will sign off. Good bye from 37,000 feet over Colorado!


Wednesday, February 04, 2009

Do You Need Solid State Technology?

Many times I am asked the question “Should I buy solid state devices for my system?” and each time I have to answer “It depends.” Of course the conversation evolves beyond that point into the particulars of their system and how they are currently using their existing IO storage subsystem. However, the question raised is still valid; do you need SSD technology in your IO subsystem? Let’s look at this question.

SSD Advantages

SSD technology has one big advantage over your typical hard disk based storage array: SSD does not depend on physical movement for retrieval of data. Being non-dependent on physical movement for data retrieval means that you can significantly reduce the latency involved with each data retrieval operation, usually on the order of a factor of 10 (for Flash-based technology) to over 100 (for RAM-DDR -based technology.) Of course cost increases as latency decreases with SSD technology, with Flash running about a quarter of the cost of RAM-DDR technology.

SSD Costs

Flash and DDR-based SSD technology are usually on a par with, or can be cheaper than, IO equivalent SAN based technology. Due to the much lower latency of SSD technology you can get many more input-output operations per second (IOPS) from them than you can from a hard disk drive system. For example, from the “slow” Flash-based technology you can get 100,000 IOPS with an average latency of 0.20 milliseconds worse case. From the fastest DDR based technology you can achieve 600,000 IOPS with a latency of .015 milliseconds.

To achieve 100,000 IOPS from hard drive technology you would need around 500 or more 15K rpm disks at between 2 and 5 milliseconds latency, giving a peak IOPS of around 200 per disk drive for random IO, regardless of their storage capacity. A 450 gigabyte 15K rpm disk drive may have up to 4 or more individual disk platters with 12 read-write heads (one for each side of the disks); however, these read-write heads are mounted on a single armature and are not capable of independent positioning. This limits the latency and IOPS to that of a single disk platter with two heads, so an 146 gigabyte 15K rpm drive will have the same IOPS and latency as a 450 gigabyte 15K rpm drive from the same manufacturer (http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah_15k_6.pdf.)

Given that the IOPS and latency are the same regardless of storage capacity for a 146 to 450 gigabytes array of disk drive sizes why not pick the smallest drive and save money? The reason is that to get the best latency you need to be sure not to fill the various disks in the disk drive (from 2 to 4) more than 30% hence the need for so many drives. So to get high performance from your disk based IO subsystem you need to throw away 60-70% of your storage capacity!

Do I Need That Many IOPS?

Many critics of SSD technology state that most systems will never need 100,000 IOPS, and in many cases they are correct. However, in testing using a 300 gigabyte TPC-H (data warehouse) type test load using SSD I was able to get peak loads of over 100,000 IOPS using a simple 4 node Oracle11g Real Application Clusters-based setup. Since many systems are considerably larger than 300 gigabytes and have more users than the 8 users with which I reached 100,000 IOPS, it is not inconceivable that given the capability to achieve 100,000 IOPS of throughput many current databases would easily exceed that value. It must also be realized that the TPC-H system I was testing utilized highly optimized indexing, partitioning, and parallel query technology; eliminate any of these capabilities and the IOPS required increases, sometimes dramatically.

Questions to Ask

So now we reach the heart of the question, do you need SSD for your system? The answer depends on several questions which only you can answer:

1. Is my performance satisfactory? If yes then why are you asking about SSD?
2. Have you maximized use of memory and optimization technologies built into your database system? If no, then do this first.
3. Has my disk IO subsystem been optimized? (Enough disks and HBAs?)
4. Is my system spending an inordinate amount of time waiting on the IO subsystem?

If the answer to question 1 is no, and questions 2, 3 and 4 are yes then you are probably a candidate for SSD technology. Don’t get me wrong, if I had my choice I would skip disk based systems altogether and use 100% SSD in any system I bought, given the choice. However, you are probably locked into a disk-based setup with your existing system until you can prove it doesn’t deliver the needed performance. Let’s look closer at the 4 questions.

Question 1 is sometimes hard to answer quantitatively. Usually the answer to 1 is more of a gut reaction than anything that can be put on paper. The users of the system can usually tell you if the system is as fast as they need it to be. Another consideration in question 1 is: performance is fine now, but what if you grow by 25-50%? If your latency is at 3-5 milliseconds on the average now, adding more load may drive it much higher.

Question 2 will require analysis of how you are currently configured. An example from Oracle is that a wait on db file sequential reads can indicate that not enough memory has been allocated to cache data blocks read based on index reads. So, even if the indexes are cached, the data blocks are not and must be read into the cache on each operation. A sequential read is an index-based read followed by a data table read and usually should be cached if there is sufficient memory. Another Oracle wait, db file scattered reads indicates full table scans are occurring. Usually full table scans can be mitigated by use of indexes or partitioning. If you have verified your memory is being used properly (perhaps everything that can be allocated has been) and you have utilized the proper database technologies and performance is still bad, then it is time to consider SSD.

A key source of wait information is of course the Statspack or AWR report for Oracle based systems. One additional benefit to the Statspack or AWR reports is that they both can contain a Cache Advisory sub-section that is used to actually determine if adding memory will help your system. By examining waits and looking at the cache advisory section of the report you can quickly determine if adding memory will help your performance. Another source of information about the database cache is the V$BH dynamic performance view. The V$BH view contains an entry for every block in the cache and with a little SQL against the view you can easily determine if there are any free blocks or if you have used all available and are in need of more. Of course use of the automated memory management features in 10g and 11g limit the usefulness of the V$BH view. In Oracle Grid and Database control interfaces (providing you have the proper licenses) you also get performance advisories which will tell you when you need more memory. Of course if you have already maximized the size of your physical memory, most of this is moot.

Question 3 may have you scratching your head. Essentially if your disk IO subsystem has reached its lowest latency and the number of IO channels (as determined by the number and type of host bus adapters) is such that no channel is saturated, then your disk-based system is optimized. Usually this is shown by latency being in the 3-5 millisecond range and still having high IO waits with low CPU usage.

Question 4 means you look at you system CPU statistics and you see that your CPUs are being under utilized and the IO waits are high, indicating the system is waiting on IO to complete before it can continue processing. A Unix or Linux based system in this condition may show high values for runqueue even when the CPU is idle.

SSD Criticisms

Other critics of SSD technology cite problems with reliability and possible loss of data with Flash and DDR technologies. In some forms of Flash and DDR they are correct; if the Flash isn’t wear leveled properly or the DDR is not properly backed-up. However, as long as the Flash technology utilizes proper wear leveling and the RAM-DDR system uses proper battery backup with permanent storage on either a Flash or hard disk based subsystem, then those complaints are groundless.

The final criticism of SSD technology is usually that the price is still too high compared to disks. I look at an advertisement from a local computer store and I see a terabyte disk drive for $99.00; it is hard for SSD to compete with that low base cost. Of course, I can’t run a database on a single disk drive. Given our 300 gigabyte system, I was hard-pressed to get reasonable performance placing it on 28 – 15K high performance disk drives; most shops would use over 100 drives to get performance. So on a single disk to SSD comparison yes, this cost would appear to be an issue; however, you must look at other aspects of the technology. To achieve high performance most disk-based systems utilize specialized controllers and caching technology and spread IO across as many disk drives as possible. This is known as short-stroking the drive so that only 20-30% of each disk drive is ever actually used. The disks are rarely used individually, instead they are placed in a RAID array (usually RAID 5, RAID 10, or some exotic RAID technology). Once the cost of additional cabinets, controllers, and other support technology is added to the base cost of the disks, not to mention any additional firmware costs added by an OEM, the costs soon level between SSD and standard hard drive SAN systems.

In a review of benchmark results the usual ratio between needed capacity and capacity utilized to achieve performance is 40-50 to 1, meaning our 300 gigabyte TPC-H system would require at least 12 terabytes of storage to provide adequate performance spread over at least 200 or more disk drives. To contrast that, an SSD based system would only need a factor of 2 to 1 (to allow for the indexes and support files).

In addition to the base equipment costs, most disk arrays consume a large amount of electricity which then results in larger heat loads for your computer center. In many cases the SSD technology only consumes a fraction of the energy and cooling costs of regular disk based systems, providing substantial electrical and cooling cost savings over their lifetimes. SSD by its very nature is green technology.

When Doesn’t SSD help?

SSD technology will not help CPU bound systems. In fact, SSD may increase the load on overworked CPUs by reducing IO based waits. Therefore it is better to resolve any CPU loading issues before considering a move to SSD technology.

In Summary

The basic rule for determining if your system would benefit from SSD technology is that if your system is primarily waiting on IO then SSD technology will help mitigate the IO wait issue.