What’s the big difference between Oracle and SQL Server indexes (Part I)

This entry is part 1 of 1 in the series SQL Server vs. Oracle

Last week Erin Stellato (blog| twitter) has asked a very interesting question on Twitter that triggered this post:

So how is it you can have good perf in Oracle, w/o IOTs, but in SQL Server, everyone says need CIs for good perf?

There are two aspects to that question

  1. What is the difference between index implementations in Oracle vs. SQL Server?
  2. How do you define good performance?Peeling an Onion

Until now I have not realized that this subject is not very well covered and since it involves diving into the internals  I offered to make an attempt to fill this gap over the Memorial Day long weekend and soon realized that there is more to this than I originally thought it is.  This will be the first post in a set that I am planning to write in the next weeks.

First, let’s begin by comparing the different types of indexes supported by Oracle and SQL Server:

Type of index SQL  Server 2008 R2 Oracle 11g
B-tree indexes Yes (Non-clustered) Yes (Default)
B+ tree indexes Yes (Clustered Indexes) Yes (IOT)
Function Based Indexes Yes (Column has to exist in table) Yes
Filtered Indexes Yes Yes
Hash cluster indexes No Yes
B-Tree cluster indexes No Yes
Bitmap Indexes No Yes

So both Oracle and SQL Server support B-tree indexes which are ordered lists of key values, associated with the storage location of the table record that contains the respective value. So far it doesn’t seem to be a big difference between the two RDBMS. Both also support B+ tree indexes although there are differences between the two implementations. B+ trees are similar to B-trees but the table records are stored in the leaf nodes of the primary key index. This allows  fast access for singleton searches (exact match) or range searches on the primary key of the table.  Microsoft calls them clustered indexes (CI) while Oracle uses the term Index Organized Tables (IOT).

Let’s look at non-clustered indexes which are the default indexes in Oracle. Here the two implementations are quite similar: Indexes are using a B-tree structure that contains  the key values and a unique reference to the storage location of the record. This is where the implementations start to separate:

In Oracle every record has a unique (inside the table) 10 byte pseudocolumn ROWID representing the physical location on disk that the record is stored. When printed, each ROWID is displayed in BASE64 format (A-Za-z0-9+/): AAAAaoAATAAABrXAAA
ROWID has a four-piece format, OOOOOOFFFBBBBBBRRR:

  • OOOOOO: The data object number that identifies the database segment (AAAAao in the example). Schema objects in the same segment, such as a cluster of tables, have the same data object number.
  • FFF: The tablespace-relative datafile number of the datafile that contains the row (file AAT in the example).
  • BBBBBB: The data block that contains the row (block AAABrX in the example). Block numbers are relative to their datafile, not tablespace. Therefore, two rows with identical block numbers could reside in two different datafiles of the same tablespace.
  • RRR: The row in the block.

Users can use ROWIDs directly for several functions:

  • Rowids are the fastest way of accessing  rows.
  • Rowids can be used to see how a table is organized.
  • Rowids uniquely identify rows in a given table.

There are two completely different perspectives when migrating from one RDBMS to another:
Most Oracle users have no concept of clustered or non clustered indexes because primary key indexes created in Oracle are balanced non-clustered indexes that include ROWID for fast access.

A common mistake that Oracle users make is assume that  the indexes are implemented the same way  in SQL server and underestimate the performance hit that comes with using a large composite primary key in SQL server on a Clustered table. It takes a while to understand that the non clustered index size  index size in SQL server can be exponentially bigger in SQL server and is not necesarly proportional to the size of the columns specified in the non clustered index. Based on this, some Oracle experts say that the clustered indexes in SQL Sever have a 110% overhead because index key is stored “twice”, once with the index entry in the intermediate index nodes – and again with the row data at the leaf level.

Another performance hit can come from the  default value of the free space (PCTFREE) that is left on a page  as Oracle leave by default 10% space on each block (at 91% block is split). After a data block is filled to the limit determined by PCTFREE, Oracle considers the block unavailable for the insertion of new rows until the percentage of that block falls beneath the parameter PCTUSED.

The alternative of using a non clustered table (heap)  is not recommended in most cases  and Paul Randal ( blog | Twitter )  has some excellent posts on this subject  as well as Erin’s recent blog post on Internals and Deletes for SQL University which I highly recommend reading.

On the opposite camp, most SQL Server users are looking for an alternative to clustered indexes and end up choosing to use Index Optimized Tables (IOT) instead of using simple tables just because they search for something that has the same structure as clustered tables in SQL Server in order to have comparable performance.

In the next part of my series I will continue with access methods in Oracle, and other index types.

What three events brought you here?

Recently Paul Randal (aka Mr. Tripp)  started a web version of the tag game in the SQL Server blogging community  on what were the three most important events that shaped your career and got you to this point in your life. It is a very difficult question because unlike data, life events do not accept a SELECT TOP 3 … ORDER BY [ImportanceRating]. The sorting function is a subjective one for humans in the case of life changing events and it depends on the impact that one event had on the individual. It is a lot harder than most would think and I spent the weekend thinking about this and trying to review what were the most important events that shaped my life and career. Here’s what I came up with:

1) My FamilySinclair ZX Spectrum+

As I grew up in a communist Romania I was one of the fortunate kids that had access to technology thanks to my parents, both University Computer Science professors. My father is still teaches Databases from the 101 course to Oracle PL\SQL  and also Assembly Language. My mom is now retired but she used to teach the Fundamentals of Computer Programming  to students in the freshman year. As a kid I got to see all the  different generations of technology from things like ferrite memory to BASIC programming on Romanian replica of Sinclair ZX Spectrum+ Z80 home computer. I have thank my parents for explaining to me how things work and as a very inquisitive boy I absorbed all that information without realizing that one day this will help me understand the systems as a whole and see the big picture where others only see the frontend. The first database I worked on was dBase III the precursor of FoxPro (anybody who used it understands why some people call database tables “files”). The interesting side to this is that while I was hooked on computers, my sister decided to go to Law School to finally land in the wireless communications industry.

2) Discovering relational algebra

During third year in the Politehnica University I found an opportunity to apply for a EU scholarship in the frame of Erasmus Programme. I remember that one of the requirements was to be fluent in French and because of that there were only 3 or 4 students that applied and I was selected. I chose the “DB and Artificial Intelligence”  major at “Universite de Paris 6 – Pierre & Marie Curie“  and one of the courses was  “Relational Databases”. We started from the basics of the relational algebra operations and representing query plans as a tree  that can be transformed, all of which which I was fascinated by. I kept my interest in this when I returned home after my year in Paris, I got my Software Engineering MS in “Database Systems Optimizations”.

3)Real world challenge

After my graduation I  decided to continue the family tradition and started teaching Labs on “Programming Languages” and “Relational Database Design” at the  Politehnica University. A few years later I  decided to get a job as Oracle DBA. The first day on the job they gave me a desktop and some form of an install manual for Oracle and their product  and asked me to read it and ask around if I had questions. Two days later, to everybody’s surprise I had their product installed and running on Oracle 9i while it was designed for 8i. This was one of the first moments when I realized that I like challenges and that I can live up to the expectations.
A few years later came a moment when I realized just how lucky I am. It was the moment when I found out that I was one of the one selected for the Diversity Visa program to get a Green Card  and move the US.  

From that point forward I did not find it hard to get a job (I got my first job in US in three days) but I learned that the most important thing is to find the right job and the right employer that would challenge me just like Brent Ozar was writing in his post on the same subject:

I want to be a successful employee in my employer’s eyes, but when I take a job, one of the questions is, “One year after someone’s taken this position, what does success look like?  What is the best employee doing?  How are you rewarding them for what they’ve done?” 
In IT, this question takes people by surprise, but the answers reveal a lot.

Until this day I can proudly say that I have exceeded all my employers’ expectations but I cannot say the same about all the jobs I held.

I’m going to tag Ted Krueger, Jorge Segarra, Mladen Prajdic and Jonathan Kehayias to see how they are answering this question. The list does not have to end here. If you would like answer this question please link back to this post or the original post by Paul.

Why did SQL Server cross the road?

I posted a comment on Kevin Kline’s blog regarding:

Microsoft Marketing Throws SQL Server Under the Bus

“A spokesperson for Microsoft said that the problem was not bandwidth but that its SQL Server database had reached excessive fragmentation levels caused by the tremendous surge of queries”.  Read about it here:

http://www.itnews.com.au/News/102445,windows-7-rc-download-crashes.aspx

and also here:

http://blogs.zdnet.com/Bott/?p=910

Why didn’t anyone involved bother to think thru the fact that competitors and various elements of anti-Microsoft factions can make a lot of hay about how Microsoft’s own product?!?

Just as politicians only ever admit to “wanting to spend more time with family”, a technology company shouldn’t point to their own product as the culprit of a major technology failure, especially when there are thousands of other reasons that the meltdown might’ve happened.

What are your thoughts?

Here are my thoughts.

I think the truth is somewhere in the middle. I have worked with both Oracle and SQL Server and while I agree that in a lot of regards Oracle is a superior product it is also because of their experience.
Let’s face it …Oracle started in 1979 while Microsoft bought Sybase in ’89.
In my opinion the first great product they had were Oracle 8 in 1997 and SQL Server 2005 in  2005. Right now Oracle has a stable multi tiered scalable clustered architecture (including app server) whereas Microsoft is still discussing about deployment dates.
Also … Oracle query optimizer is IMO one of the best in the industry since Oracle 8. Microsoft has finally developed a good (should I say … comparable)query optimizer in 2008.

So while SQL Server is a good product  we should compare apples to apples because IMO Oracle and Microsoft play in different leagues.