Presenting at SQL Saturday in Madison and Chicago

This week I am back in Madison, where I  will be presenting my newest  session on “How to baseline IO performance for your next SQL Server”  at  SQL Saturday #287 March 29, 2014. My next engagement will be in Addison for SQL Saturday #291 Chicago on April 26 2014 

Madison has a new location at 6000 American Parkway, Building A, Madison, WI 53783

There are 3 great Full day in-depth training sessions with respected SQL Server professionals on Friday , March 28  at  Globe University Madison East. Click a link for more information and registration!


Chicago SQL Saturday  will be at the same great location  DeVry University Addison 1221 N Swift Road, Addison, IL 60101

There will be also 2 great full day in-depth training sessions with respected SQL Server professionals on Friday , April 25 2014. Click a link for more information and registration!

Better Performance Through Parallelism with Adam Machanic
Virtualization for SQL Server DBAs  with David Klee

Source: J. Crocker

SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. Admittance to this event is free, all costs are covered by donations and sponsorships. 

Hope to see you there!

My road to PASS Summit 2012

It’s been a while since I decided to give back to the SQL Server community  and start speaking at community events. I started on a journey as a speaker that has been incredibly rewarding and even though most of the time I paid for it out of my own pocket, the knowledge and the friends that I made on this journey were worth every effort. Next week two of my dreams will come true: attending PASS Summit for the first time and presenting my session titled “Optimizing SQL Server I/O with Solid State Drives ” at PASS on Friday, November 09, 2012, 1:00 PM – 2:15 PM in 302-TCC.
To put things  in perspective, this year has proved to  be truly amazing: Being mentored by Joe Sack blog|@josephsack from  SQLskills, attending my third  SQLskills Immersion Event – IE3, speaking at 6 SQLSaturday events in Madison WI, Chicago IL, Indianapolis IN, Iowa City IA, Kalamazoo MI and Minneapolis MN. On a personal note I have completed the Pewaukee triathlon and completed a Century ride (100 miles cycling)  with Bicycling Club of Lake County.

I am extremely honored to be selected to speak a PASS Summit 2012, the greatest SQL Server conference of the year where I will join more 150 of the industry’s leading speakers while having the unique chance to meet over 2400 database professionals. I would like to thank the PASS  Program Committee and Speaker review team for their support and for selecting my session .

I hope to see you next week at the PASS Summit and hope you can make it to my session or just come by and say hi and one of the other events.




A SSD Technology a Day (8) – eMLC and MLC-HET

This entry is part 9 of 9 in the series One SSD Technology a Day

In the 2nd post off this series we have explored the differences between SLC and MLC and saw that the main issue with MLC is endurance which in the past prevented their use for enterprise applications. Because of the increased capacity of MLC is a right fit for enterprise use, flash memory manufacturers looked for a way to increase the endurance characteristic of MLC memories. When analyzing the reasons why MLC cells fail sooner than SLC the main culprit was the tight reference voltage that after a numbur of flash write cycles is being overlapped by the actual charge left in the cell leading to an incorrect value being read from that cell. When that happens a few times the cell is being marked as bad.
The solution was to try to make the programing cycles more precise in order to increase the interval around reference levels from and allow more room for error when the memory cell degrades. Also the silicon dies are being tested and only the ones having better endurance characteristics are selected for enterprise use. This flash memory has been marketed as eMLC or MLC-HET (high endurance).

eMLC reference levels - Image courtesy of

This memory has improved endurance over consumer grade MLC with one downside that programming increases in order to allow for more precise reference levels.
The average Write cycles for this type of memory is between 10K and 30K times.

This post comes to you from the shores of Lake Michigan in Muskegon where we are spending the weekend at a cottage in the company of good friends. In order to continue this SSD saga I found myself forced to write this using the WordPress iPhone app much like the character played by Robin Williams in the movie RV. Please excuse any spelling errors that you might find.

A SSD Technology a Day (7) – Intelligent Bad Block Management

This entry is part 8 of 9 in the series One SSD Technology a Day

In previous posts I talked about the wear and RAISE algorithms implemented by SSD controllers. One of the inevitable issues solid state memory is how to gracefully deal with bad blocks.

Bad blocks are flash memory blocks that contain one or more invalid bits whose reliability is not guaranteed because of faulty dies, over-charge leaks or wear-out. Bad blocks may exist even on a new disk.

Bad Block Management or Intelligent Block Management is an algorithm that monitors and maintains bad blocks within the NAND device. The controller maintains tables of known bad blocks and can replace the new bad blocks with spare blocks that are reserved for use. typically 4-10% of the usable capacity is reserved for Bad Block Management This practice further enhances the overall SSD lifespan and ensures that a few bad blocks will not affect the integrity of the drive and the SSD device still operate.

Bad blocks are mapped as “do not use”and substituted with known good blocks from the percentage set aside for spare blocks. The percentage of bad blocks which can be accomodated is a product marketing decision and that is the reason why an 128GB SSD device will only present 120GB as available to use. The spare blocks come from over provisioning inside the SSD and using the capacity which is invisible to the user.

If the bad blocks exceed the remaining spare blocks for any reason- the SSD fails because the controller cannot safely substitute a block that has to be written on a new bad block and can result in data loss.

Retired block count

The measure for Bad Block Management is “Retired Block Count” but expect that count to go up as “SSD Life Left” nears 5-10%

A SSD Technology a Day (6) – CacheCade

This entry is part 7 of 9 in the series One SSD Technology a Day

CacheCade is a technology developed by LSI  for its MegaRAID Storage controllers. CacheCade software allows  you to mix inexpensive SATA or even SAS hard disk drives with up to 512GB of solid state storage capacity distributed over a few SSD drives, to to provide a substantial performance boost, adding additional SATA HDDs or moving to an all SSD RAID volume to achieve performance requirements.

CacheCade - Image (c) Copyright LSI Corporation

This combination of HDDs and SSDs as secondary cache is especially best suited  in random read intensive applications where hot data can be moved to SSD storage in order to take advantage of the low latency, High IOPS characteristics of SSD at a reasonable price.
This technology is available on LSI  MegaRAID 9260 and 9280  controller series as well as on re-badged RAID controllers like Dell PERC H700 and H800 with 1GB Cache.

While CacheCade version 1.o offers only read cache (only one supported by DELL) CacheCade 2.o offers read and write caching for impressive results. This technology requires a inexpensive hardware license ($300). Read more about this here

As mentioned earlier Dell also offers the CacheCade 1.0 technology for Dell PERC H700 and H800 controllers with 1 GB NVRAM and firmware version 7.2 or later. This is an excellent solution that combines SSD with regular HDD arrays in order to intelligently store the hot data on 1 or more SSDs (Maximum SSD pool is 512GB).
In their Whitepaper they claim to double the number of transaction using up to 4x50GB SSDs
Read more here:
and here on technical details

A SSD Technology a Day (5) – Wear Leveling

This entry is part 6 of 9 in the series One SSD Technology a Day

Wear leveling is a technology used in Solid State Drive  controllers to prolong the service life of flash memory. As mentioned in the 2nd post of this blog series  What’s the difference between SLC and MLC? flash memories have limited endurance,measured by the number of P/E cycles that the memory can perform before becoming degraded. Wear leveling ensures that all cells are getting the same number of P/E cycles (even wear) so that you do not have  just a few cells on the drive receive the majority of the writes and wear out early. This might cause the drive to fail when most of the memory on it is still usable way ahead of the prescribed service life.

Memory wear-out concerns are unique to flash-based memory. Hard disks store data by magnetizing a thin film of ferromagnetic material on a disk. DRAM is volatile memory  (the memory stores data only while it is powered on). Flash memory stores data inside the NAND cell  via a process called tunneling while the floating gate is being flooded with high voltage. This leaves a charge in the NAND cell and that charge can be read over and over. Because  of this invasive writing method and coresponding erasing method, the flash cells degrade over time.

Wear leveling algorithm basically stores the P/E count of each cell and writes the next block to the  “least used available cell” so that cells that were intensively used are put to the end of the queue until all cells have an even wear.

One caveat is that new disks will have a much better performance than the once that were used intensively because all cells are good candidates for writing. Once the disk has been used, there is a performance degradation due to the need to erase  the cell selected by wear leveling. So a good advice would be not to test a new drive but first perform writes on 2-3 times the capacity of the drive before starting tests(e.g. 240GB SSD drive should have 500-750GB lifetime writes) in order to simulate the a real production scenario instead of just  benchmarking a brand new disk. One other thing that affects this performance is Static Data Rotation which is discussed in the first post of the series. Lifetime writes can be queried using Cristal DiskInfo and other free tools.

Disk parameters to check

A SSD Technology a Day (2) – What’s the difference between SLC and MLC?

This entry is part 3 of 9 in the series One SSD Technology a Day

It’s time dissect the two main types of flash chips in order to understand why not all SSDs are created equal. What is after all the physical difference between SLC and MLC?

SLC stands for Single Level Cell and just like the name suggests can store one bit per NAND gate hence SLC cell has two states:
0 or 1 based on the charge of the NAND gate.

SLC Levels
SLC Reference Levels


MLC on the other hand stands for Multi-Level Cell and uses multiple voltage threshold levels in order to store 2 or even 3 bits (also called TLC – Triple-Level Cell) in the same NAND gate. this is done by coding 4 or even 8 states (in the case of 3 bit TLC) on the same gate so the MLC will typically one of the following states :
11, 10, 01, 00). The benefit over SLC is the increased capacity per chip (2 or 3 times more) but at the same time the voltage reference levels are a lot tighter which leads to more rapid degradation of the cell after a lot of P/E (Program/Erase) Cycles. Once the MLC NAND gate has degraded the reads are no longer predictable because the stored value  overlaps reference  levels. In this case the memory will report an error or if the controller supports it it will retire the cell and replace it with one from the reserve capacity.

MLC Reference Levels
MLC Reference Levels (2bit cell)

Typical number of write cycles is pretty solid around 100K for SLC and floats around 10K for MLC (different dies can have very different quality and will wear differently). This number is still high enough for a consumer lifecycle in the case of  MLC if the entire memory is programmed 5 times  daily for 5 years and  runs uptu 50 years for SLC under the same usage.


Type of flash cell SLC MLC 2bit TLC 3bit
Bits/ cell 1 2 3
States stored 0 00 000
01 010
1 10 100
11 110
Typical capacity /chip 32GB 64GB 96GB
Endurance P/E cycles 100K 10K-30K <1K
Performance Over Time Constant Degrades Quickly Degrades
Application Enterprise Consumer Thumb drives,Camera cards

In the case of MLC the program cycle take 2 or 3 times more than for SLC since the  programming signal has to be a lot more precise to code 4 states in the space of 2. This leads to higher speed and increased number of IOPS (IO Operations Per Second) for SLC type of memory compared to MLC.

A SSD Technology a Day (1) – Static Data Rotation

This entry is part 2 of 9 in the series One SSD Technology a Day

One of the main drawbacks of SSD  has been reliability. Every NAND cell has a certain prescribed number of Program/Erase (P/E) cycles and as data is being written to disk, chances are it will remain unchanged for weeks or months. That means that the cells that are being used to store that data will have the same wear level (used P/E cycles) for the weeks or months that data was unchanged. This becomes a problem because the remaining free cells are going to be taxed even more and could reach their end of life and make the entire drive read only or even fail it completely.
I discovered this technology while I was trying to explain the degraded performance on my  new OCZ Vertex 3 SSD drive. I ran a bunch of tests using SQLIO based on Jonathan Kehayias (Blog|Twitter) post about Parsing SQLIO Output to Excel Charts using Regex in PowerShell with a 6GB file and I got some good results. I started using the drive and installed a few VMs until 50% of the drive was full. At that point I kept running SQLIO and Crystal Disk Mark test only to see the performance sinking more and more.

Little did I know that OCZ Vertex 3 which is based on SandForce 2281 chipset implements an intelligent Static Data Rotation algorithm as part of   Duraclass (Sandforce’s set of technologies to increase the reliability of the drive). This means that the SSD controller  actively rotates static data  from cells intensively used to other cells that were least used during  idle periods  to allow the drives wear leveling to work  at it’s best. But what happens when you stress test the disk and you run the about  3 times the size of the drive worth of data in a couple hours while half of the drive is full. The Sandforce  Duraclass algorithm will kick in and start moving data around even when the drive is not idle and the user will see a decrease in performance until the wear level is stabilized.

Essentially Static Data Rotation is there to make sure that you can use the drive for the MTTF prescribed by the manufacturer and prevents premature wear on the cells that store hot data.

There is an interesting post on the OCZ Technology Forum about this

UPDATE: Nitin Salgar (b|t) Has asked avery good question on Twitter after reading my post:

“Is Static Data Rotation in SSD a common phenomenon across all manufacturers?”

The answer is no, this is one of the strong selling points for the newer Sandforce SSD controllers that implement Duraclass. Newer Intel controllers have this technology as well but older ones do not have it. I would like to think that any Enterprise class controller has its own implementation of  a Static Data Rotation algorithm.


New blog series: One SSD Technology a Day

This entry is part 1 of 9 in the series One SSD Technology a Day

It’s become a tradition in the SQL blogs to start a month long series on a certain topic and try to blog every day. Today I want to start my first series on SSD technologies. It has been over a year since I started speaking on this topic and as this is a hot topic there are new technologies that I feel are not well explained even on specialty blogs.

I will keep this as a master post and will add links to each of the posts  as I publish them.

  1. Static Data Rotation 
  2. What’s the difference between SLC and MLC?
  3. Program and Erase Cycle (P/E)
  4. Redundant Array of Independent Silicon Elements (RAISE)
  5. Wear Leveling
  6. CacheCade
  7. Bad Block Management