In the 2nd post off this series we have explored the differences between SLC and MLC and saw that the main issue with MLC is endurance which in the past prevented their use for enterprise applications. Because of the increased capacity of MLC is a right fit for enterprise use, flash memory manufacturers looked for a way to increase the endurance characteristic of MLC memories. When analyzing the reasons why MLC cells fail sooner than SLC the main culprit was the tight reference voltage that after a numbur of flash write cycles is being overlapped by the actual charge left in the cell leading to an incorrect value being read from that cell. When that happens a few times the cell is being marked as bad.
The solution was to try to make the programing cycles more precise in order to increase the interval around reference levels from and allow more room for error when the memory cell degrades. Also the silicon dies are being tested and only the ones having better endurance characteristics are selected for enterprise use. This flash memory has been marketed as eMLC or MLC-HET (high endurance).
eMLC reference levels - Image courtesy of www.tomshardware.com
This memory has improved endurance over consumer grade MLC with one downside that programming increases in order to allow for more precise reference levels.
The average Write cycles for this type of memory is between 10K and 30K times.
This post comes to you from the shores of Lake Michigan in Muskegon where we are spending the weekend at a cottage in the company of good friends. In order to continue this SSD saga I found myself forced to write this using the WordPress iPhone app much like the character played by Robin Williams in the movie RV. Please excuse any spelling errors that you might find.
In previous posts I talked about the wear and RAISE algorithms implemented by SSD controllers. One of the inevitable issues solid state memory is how to gracefully deal with bad blocks.
Bad blocks are flash memory blocks that contain one or more invalid bits whose reliability is not guaranteed because of faulty dies, over-charge leaks or wear-out. Bad blocks may exist even on a new disk.
Bad Block Management or Intelligent Block Management is an algorithm that monitors and maintains bad blocks within the NAND device. The controller maintains tables of known bad blocks and can replace the new bad blocks with spare blocks that are reserved for use. typically 4-10% of the usable capacity is reserved for Bad Block Management This practice further enhances the overall SSD lifespan and ensures that a few bad blocks will not affect the integrity of the drive and the SSD device still operate.
Bad blocks are mapped as “do not use”and substituted with known good blocks from the percentage set aside for spare blocks. The percentage of bad blocks which can be accomodated is a product marketing decision and that is the reason why an 128GB SSD device will only present 120GB as available to use. The spare blocks come from over provisioning inside the SSD and using the capacity which is invisible to the user.
If the bad blocks exceed the remaining spare blocks for any reason- the SSD fails because the controller cannot safely substitute a block that has to be written on a new bad block and can result in data loss.
Retired block count
The measure for Bad Block Management is “Retired Block Count” but expect that count to go up as “SSD Life Left” nears 5-10%
CacheCade is a technology developed by LSI for its MegaRAID Storage controllers. CacheCade software allows you to mix inexpensive SATA or even SAS hard disk drives with up to 512GB of solid state storage capacity distributed over a few SSD drives, to to provide a substantial performance boost, adding additional SATA HDDs or moving to an all SSD RAID volume to achieve performance requirements.
CacheCade - Image (c) Copyright LSI Corporation
This combination of HDDs and SSDs as secondary cache is especially best suited in random read intensive applications where hot data can be moved to SSD storage in order to take advantage of the low latency, High IOPS characteristics of SSD at a reasonable price.
This technology is available on LSI MegaRAID 9260 and 9280 controller series as well as on re-badged RAID controllers like Dell PERC H700 and H800 with 1GB Cache.
While CacheCade version 1.o offers only read cache (only one supported by DELL) CacheCade 2.o offers read and write caching for impressive results. This technology requires a inexpensive hardware license ($300). Read more about this here
Wear leveling is a technology used in Solid State Drive controllers to prolong the service life of flash memory. As mentioned in the 2nd post of this blog series What’s the difference between SLC and MLC? flash memories have limited endurance,measured by the number of P/E cycles that the memory can perform before becoming degraded. Wear leveling ensures that all cells are getting the same number of P/E cycles (even wear) so that you do not have just a few cells on the drive receive the majority of the writes and wear out early. This might cause the drive to fail when most of the memory on it is still usable way ahead of the prescribed service life.
Memory wear-out concerns are unique to flash-based memory. Hard disks store data by magnetizing a thin film of ferromagnetic material on a disk. DRAM is volatile memory (the memory stores data only while it is powered on). Flash memory stores data inside the NAND cell via a process called tunneling while the floating gate is being flooded with high voltage. This leaves a charge in the NAND cell and that charge can be read over and over. Because of this invasive writing method and coresponding erasing method, the flash cells degrade over time.
Wear leveling algorithm basically stores the P/E count of each cell and writes the next block to the “least used available cell” so that cells that were intensively used are put to the end of the queue until all cells have an even wear.
One caveat is that new disks will have a much better performance than the once that were used intensively because all cells are good candidates for writing. Once the disk has been used, there is a performance degradation due to the need to erase the cell selected by wear leveling. So a good advice would be not to test a new drive but first perform writes on 2-3 times the capacity of the drive before starting tests(e.g. 240GB SSD drive should have 500-750GB lifetime writes) in order to simulate the a real production scenario instead of just benchmarking a brand new disk. One other thing that affects this performance is Static Data Rotation which is discussed in the first post of the series. Lifetime writes can be queried using Cristal DiskInfo and other free tools.
One of the limitations of flash memory is that while it can be read or programmed a byte or a word at a time in a random access fashion just like regular RAM, it can only be erased a “block” at a time. This will set all bits in the block to 1 which is the default state for NAND memory.
Writing a byte in flash memory involves 2 steps: Program and Erase (P/E). The block is written to a new cell and the old block needs to be erased.
The programing can be done at cell level (setting it to the “0” state) via a process called tunneling while the floating gate is being flooded with high voltage using the on-chip charge pumps.
Erasing can be done only on an entire block (resetting it to the “1” state), thru high negative voltage pulling the electrons off the floating gate via process called quantum tunneling. Flash memory is divided in erase segments (often called blocks or sectors).
It’s time dissect the two main types of flash chips in order to understand why not all SSDs are created equal. What is after all the physical difference between SLC and MLC?
SLC stands for Single Level Cell and just like the name suggests can store one bit per NAND gate hence SLC cell has two states:
0 or 1 based on the charge of the NAND gate.
SLC Reference Levels
MLC on the other hand stands for Multi-Level Cell and uses multiple voltage threshold levels in order to store 2 or even 3 bits (also called TLC – Triple-Level Cell) in the same NAND gate. this is done by coding 4 or even 8 states (in the case of 3 bit TLC) on the same gate so the MLC will typically one of the following states :
11, 10, 01, 00). The benefit over SLC is the increased capacity per chip (2 or 3 times more) but at the same time the voltage reference levels are a lot tighter which leads to more rapid degradation of the cell after a lot of P/E (Program/Erase) Cycles. Once the MLC NAND gate has degraded the reads are no longer predictable because the stored value overlaps reference levels. In this case the memory will report an error or if the controller supports it it will retire the cell and replace it with one from the reserve capacity.
MLC Reference Levels (2bit cell)
Typical number of write cycles is pretty solid around 100K for SLC and floats around 10K for MLC (different dies can have very different quality and will wear differently). This number is still high enough for a consumer lifecycle in the case of MLC if the entire memory is programmed 5 times daily for 5 years and runs uptu 50 years for SLC under the same usage.
Type of flash cell
SLC
MLC 2bit
TLC 3bit
Bits/ cell
1
2
3
States stored
0
00
000
001
01
010
011
1
10
100
101
11
110
111
Typical capacity /chip
32GB
64GB
96GB
Endurance P/E cycles
100K
10K-30K
<1K
Performance Over Time
Constant
Degrades
Quickly Degrades
Application
Enterprise
Consumer
Thumb drives,Camera cards
In the case of MLC the program cycle take 2 or 3 times more than for SLC since the programming signal has to be a lot more precise to code 4 states in the space of 2. This leads to higher speed and increased number of IOPS (IO Operations Per Second) for SLC type of memory compared to MLC.
One of the main drawbacks of SSD has been reliability. Every NAND cell has a certain prescribed number of Program/Erase (P/E) cycles and as data is being written to disk, chances are it will remain unchanged for weeks or months. That means that the cells that are being used to store that data will have the same wear level (used P/E cycles) for the weeks or months that data was unchanged. This becomes a problem because the remaining free cells are going to be taxed even more and could reach their end of life and make the entire drive read only or even fail it completely.
I discovered this technology while I was trying to explain the degraded performance on my new OCZ Vertex 3 SSD drive. I ran a bunch of tests using SQLIO based on Jonathan Kehayias (Blog|Twitter) post about Parsing SQLIO Output to Excel Charts using Regex in PowerShell with a 6GB file and I got some good results. I started using the drive and installed a few VMs until 50% of the drive was full. At that point I kept running SQLIO and Crystal Disk Mark test only to see the performance sinking more and more.
Little did I know that OCZ Vertex 3 which is based on SandForce 2281 chipset implements an intelligent Static Data Rotation algorithm as part of Duraclass (Sandforce’s set of technologies to increase the reliability of the drive). This means that the SSD controller actively rotates static data from cells intensively used to other cells that were least used during idle periods to allow the drives wear leveling to work at it’s best. But what happens when you stress test the disk and you run the about 3 times the size of the drive worth of data in a couple hours while half of the drive is full. The Sandforce Duraclass algorithm will kick in and start moving data around even when the drive is not idle and the user will see a decrease in performance until the wear level is stabilized.
Essentially Static Data Rotation is there to make sure that you can use the drive for the MTTF prescribed by the manufacturer and prevents premature wear on the cells that store hot data.
UPDATE: Nitin Salgar (b|t) Has asked avery good question on Twitter after reading my post:
“Is Static Data Rotation in SSD a common phenomenon across all manufacturers?”
The answer is no, this is one of the strong selling points for the newer Sandforce SSD controllers that implement Duraclass. Newer Intel controllers have this technology as well but older ones do not have it. I would like to think that any Enterprise class controller has its own implementation of a Static Data Rotation algorithm.
It’s become a tradition in the SQL blogs to start a month long series on a certain topic and try to blog every day. Today I want to start my first series on SSD technologies. It has been over a year since I started speaking on this topic and as this is a hot topic there are new technologies that I feel are not well explained even on specialty blogs.
I will keep this as a master post and will add links to each of the posts as I publish them.
I’ve been meaning to write my blog post on last week’s SQL Saturday experience so finally I got to work so here it is:
Bored Member
After the previous weekend marked a record with 5 (five) SQL Saturday events all around the world on Apr 14 it was time for Wisconsin to show what they can do. And boy did they put on a great event. The organizing team did a fabulous job, helping 227 hungry minds learn about SQL Server from 33 speakers in 36 sessions organized on 6 tracks. They almost made it look easy.
I remember 2 years ago meeting the person responsible for this at SQL Saturday #31 in Chicago. Back then Jes Borland (b|t) had traveled 160 miles to volunteer at the event. She enjoyed it so much that she came back the next year as a speaker and a year after that with the help of her great friends from MADPASS , she put on an epic first time event! This goes to show a how important a single person can be in the SQL Family and anyone who thinks that they will never be able to raise to the level of some of the speakers has to learn a lesson from Jes. Every marathon starts with a first step (and she will run one in Kenosha the next week too ) and every little helps.
I left work Friday afternoon, after a pretty busy day with Vince (b|t) and we headed north to the land of beer and cheese. We got to Madison, checked in at our hotel and shortly after I was headed to the Speaker Dinner. The atmosphere was great, beer was good and the lasagna was too much for me to handle in one sitting. The highlight of the night was the paper airplane fight started by Aaron Lowe (b|t) and continued in part by me. We had a lot of fun unwary that above our head was a clothesline with underwear and shirts to complete the Italian experience.
The next day we woke up bright and early, had a frugal breakfast at the hotel and headed to Madison Area Technical College for the event, registered, grabbed some coffee and a bagel and after finding the speaker room we chose Mike Donnelly’s (b|t) SSIS: Figuring Out Configuring session, his first SQL Saturday presentation ever. He did a great job with a demo packed session that had a very good audience.
Unicorns and gang signs w/ @StrateSQL
Next we decide to take Erin Stellato’s (b|t) DBCC Commands: The Quick and the Dangerous. She did a great job condensing all DBCC commands in one session. I could not help not to drop the ” unicorns will die if you shrink the database” slogan and the room liked it. The same room hosted an extraordinary fun session on what happens when you think outside of the reporting box. Stacia Misner’s (b|t) (Way Too Much) Fun with Reporting Services was spiced up with a lot of humor with the help of her daughter Erika Bakse (b|t) . They started playing Words With Friends inside a SSRS report and then went under the covers to explain how it is done and mainly how to use Report Actions in SSRS to create interactions inside a report.
It was lunchtime and the menu included tasty brats, burgers and Cows of a Spot tables, the Wisconsin version of Birds of a feather. I sat at the Data Profiling table and had a great conversation with Ira Whiteside (b|t) and his lovely wife Theresa Whiteside. They where very kind to share with me and Vince some of the experience accumulated during the course of their vast career working with Data Profiling and Data Quality as well as some of the methods that were presented in Ira’s session that started right after lunch. It was a extremely interesting session titled Creating a Metadata Mart w/ SSIS – Data Governance and we learned a lot from it.
It was time for my session on Optimizing SQL Server I/O with Solid State Drives a session that seems to be very popular (same session was selected for SQL Saturday #119 in Chicago as well at Washington DC and Tampa before). I had some great feedback that I will try to use to improve the format and add some fresh content (whiteboard on TRIM, Wear Leveling and Bad Block Management). I had the honor to have in the audience Norm Kelm (b|t), Craig Purnell (b|t) and Matt Cherwin (t). We did an impromptu drawing for a Ted Krueger’s book (signed by Ted himself) and we stayed in the same room for the last session of the day with Sanil Mhatre (b|t)on Asynchronous programming with Service Broker . He did an excellent job and it was his first SQL Saturday presentation ever.
It was a great day and an epic event by MADPASS which made me look forward to the next SQL Saturday in Madison.