Last week Erin Stellato (blog| twitter) has asked a very interesting question on Twitter that triggered this post:
So how is it you can have good perf in Oracle, w/o IOTs, but in SQL Server, everyone says need CIs for good perf?
There are two aspects to that question
- What is the difference between index implementations in Oracle vs. SQL Server?
- How do you define good performance?
Until now I have not realized that this subject is not very well covered and since it involves diving into the internals I offered to make an attempt to fill this gap over the Memorial Day long weekend and soon realized that there is more to this than I originally thought it is. This will be the first post in a set that I am planning to write in the next weeks.
First, let’s begin by comparing the different types of indexes supported by Oracle and SQL Server:
|Type of index||SQL Server 2008 R2||Oracle 11g|
|B-tree indexes||Yes (Non-clustered)||Yes (Default)|
|B+ tree indexes||Yes (Clustered Indexes)||Yes (IOT)|
|Function Based Indexes||Yes (Column has to exist in table)||Yes|
|Hash cluster indexes||No||Yes|
|B-Tree cluster indexes||No||Yes|
So both Oracle and SQL Server support B-tree indexes which are ordered lists of key values, associated with the storage location of the table record that contains the respective value. So far it doesn’t seem to be a big difference between the two RDBMS. Both also support B+ tree indexes although there are differences between the two implementations. B+ trees are similar to B-trees but the table records are stored in the leaf nodes of the primary key index. This allows fast access for singleton searches (exact match) or range searches on the primary key of the table. Microsoft calls them clustered indexes (CI) while Oracle uses the term Index Organized Tables (IOT).
Let’s look at non-clustered indexes which are the default indexes in Oracle. Here the two implementations are quite similar: Indexes are using a B-tree structure that contains the key values and a unique reference to the storage location of the record. This is where the implementations start to separate:
In Oracle every record has a unique (inside the table) 10 byte pseudocolumn ROWID representing the physical location on disk that the record is stored. When printed, each ROWID is displayed in BASE64 format (A-Za-z0-9+/): AAAAaoAATAAABrXAAA
ROWID has a four-piece format,
OOOOOO:The data object number that identifies the database segment (
AAAAaoin the example). Schema objects in the same segment, such as a cluster of tables, have the same data object number.
FFF:The tablespace-relative datafile number of the datafile that contains the row (file
AATin the example).
BBBBBB:The data block that contains the row (block
AAABrXin the example). Block numbers are relative to their datafile, not tablespace. Therefore, two rows with identical block numbers could reside in two different datafiles of the same tablespace.
RRR:The row in the block.
Users can use ROWIDs directly for several functions:
- Rowids are the fastest way of accessing rows.
- Rowids can be used to see how a table is organized.
- Rowids uniquely identify rows in a given table.
There are two completely different perspectives when migrating from one RDBMS to another:
Most Oracle users have no concept of clustered or non clustered indexes because primary key indexes created in Oracle are balanced non-clustered indexes that include ROWID for fast access.
A common mistake that Oracle users make is assume that the indexes are implemented the same way in SQL server and underestimate the performance hit that comes with using a large composite primary key in SQL server on a Clustered table. It takes a while to understand that the non clustered index size index size in SQL server can be exponentially bigger in SQL server and is not necesarly proportional to the size of the columns specified in the non clustered index. Based on this, some Oracle experts say that the clustered indexes in SQL Sever have a 110% overhead because index key is stored “twice”, once with the index entry in the intermediate index nodes – and again with the row data at the leaf level.
Another performance hit can come from the default value of the free space (PCTFREE) that is left on a page as Oracle leave by default 10% space on each block (at 91% block is split). After a data block is filled to the limit determined by
PCTFREE, Oracle considers the block unavailable for the insertion of new rows until the percentage of that block falls beneath the parameter
The alternative of using a non clustered table (heap) is not recommended in most cases and Paul Randal ( blog | Twitter ) has some excellent posts on this subject as well as Erin’s recent blog post on Internals and Deletes for SQL University which I highly recommend reading.
On the opposite camp, most SQL Server users are looking for an alternative to clustered indexes and end up choosing to use Index Optimized Tables (IOT) instead of using simple tables just because they search for something that has the same structure as clustered tables in SQL Server in order to have comparable performance.
In the next part of my series I will continue with access methods in Oracle, and other index types.