|
|
As some of you might be aware Paul Randal (blog | twitter) has just announced the launch of a new SQL Skills Community Mentoring program for a few individuals that took part in the SQL Skills Immersion Events. I’m deeply honored to be one of the select few that get access to invaluable advice from what I consider the best team in the SQL Server world.
Allow me to go back to a year ago (and boy what a great year 2011 was), when I could only dream of getting trained by Paul Randal and Kimberly Tripp, but my perseverance paid off and after multiple requests, me and my buddy Vince managed to convince the company to send us to the first week of Immersion training in May this year especially since it was in the western suburbs of Chicago. Very soon I realized that this is by far the best training I have ever received, period. The amount of content was overwhelming (as Brad put it “it’s like drinking from a proverbial fire hose”) but the presentation style was extremely engaging and well prepared. Paul and Kimberly don’t just tell you “this is what you should/shouldn’t do” they go beyond that and explain why and how and then they demo the whole concept. Just like a good teacher they can take a difficult topic and present it in a way that makes it easy for you to follow and absorb a lot of information.
On the fourth day of the IE 1 class I volunteered to present a slide deck. I was extremely embarrassed as I felt like I was not in the same league as the rest of the folks present, but everything went so well that it made me reconsider my thoughts. As a matter of fact it made me submit a couple sessions for SQL Saturday #82 in Indianapolis and this was the beginning of my public speaking adventure. A few days ago I actually presented the same slide deck from IE1 at SQL Saturday #96 in Washington, DC and the feedback was excellent.
Last month I continued my learning odyssey with the second week of training (IE2). Right away the fact that we attended IE1 made a huge difference and it helped us understand a lot better the subjects presented to us. This time it wasn’t just Paul and Kimberly but Jonathan and Joe have joined them adding their technical skills and expertise to the table. I learned so many new things that I have already used in my environment.
To conclude, these classes have helped me to understand significantly better how SQL Server works. Things like PLE, forwarding records, heap fragmentation, index tuning, page structure, SAN performance, waits and stats, extended events, NUMA architectures, MCM program are only a few of the subjects that were explained in great detail.
Now on the last binary day of the century all this becomes even better by having the unique chance to have someone like Joe Sack as Mentor and working with him to advance my career. It’s been an incredible year for me and I look forward to work with my Mentor Joe Sack and see what the next 6 months will bring !
Once more, thank you SQLskills. It is an honor to be in such good company.
This is the second time I’m writing a post in the “Meme Monday” series and this time it originates from Tom LaRock (blog | twitter). The topic is : What #SQLfamily means to me.
This weekend I participated in a #SQLfamily reunion in Washington, DC. Some might argue that this is called SQL Saturday #96 but I prefer to see it as a family reunion because this is how it felt. It started with a #SQLfamily dinner where we had a good time told stories (some would even make a good Agatha Christie book)… but I digress. We all sat around the table and felt like we are part of a family. I met Tom LaRock, Joey Dantoni, Karen Lopez, Andy Leonard, Brian Moran, Aaron Nelson, Matt Velic, Chris Skorlinski, Wayne Sheffield, Sebastian Meine and many others. I truly felt like a family reunion and it was.
On Saturday we continued, had fun presenting our sessions, learning new things, hanging out in the speaker room and talking about all kinds of things. Everybody contributed to making this #SQLfamily reunion an event to remember. Tom even offered to make coffee and but had no idea what to do with the extension cord. (…after the last power outage they had in New England you would think he should know better )
Looking back I realize that every #SQLfamily event I have participated to has made me feel like this whether it was a SQL Saturday in Indianapolis or in Chicago or a user group meeting I never felt left out. I had my brothers and sisters and uncles and aunts and they made me feel like I am part of the #SQLfamily.
And the best part is that our #SQLfamily is growing every day. Every reunion we discover more people we can relate to even though we are not related (but in a way we are).
God bless our #SQLfamily!
John Sterrett blog|twitter asked a question about “How do I loop by rowcount in SSIS? I have a table with 5 million IDs and would like to loop to process 10K at a time. #sqlhelp”
This cannot be done natively in SSIS but you could create a stored procedure that uses the concept known as “poor man’s cursor” to split it in batches.
I included a template of code that I use to create smaller transactions that do not block normal users accessing the same tables
Let me know what you think.
/***************************************************************************************
* Name: Poor mans cursor example for batch processing
* Author: Luke Jian - luke(a.t)sqlpositive.com
* Creation Date: 11/02/11
* Purpose: Template for batch processing using Poor man's cursor
***************************************************************************************/
DECLARE @RowStart [INT]
DECLARE @RowEnd [INT]
DECLARE @RowToDo [INT]
DECLARE @RowDone [INT]
DECLARE @BatchSize [INT]
BEGIN
-- Load Users To Enable
IF ( OBJECT_ID('[TempDB]..[#PoorManCursor]') IS NOT NULL )
DROP TABLE [#PoorManCursor]
SELECT --DISTINCT
[ID] = IDENTITY ( [INT], 1, 1)
--.. SOME other rows
INTO [#PoorManCursor]
FROM dbo.BaseTables
--ORDER BY [Whatever column for batch order]
-- Add index to temp table
-- because table may easily exceed 10,000,000 rows
CREATE UNIQUE CLUSTERED INDEX [UCPoorManCursorID] ON [#PoorManCursor] ( [ID] )
-- Set up batching
SELECT @RowStart = 1,
@RowEnd = @BatchSize, --can use 10K
@RowToDo = MAX([ID]),
@RowDone = 0
FROM [#PoorManCursor]
PRINT CAST(@RowToDo AS [VARCHAR]) + ' records to be processed.'
-- Process batch
WHILE ( @RowStart <= @RowToDo )
BEGIN
BEGIN TRANSACTION
-- Load PoorManCursor_Tracking
INSERT INTO [dbo].[SomeTargetTable]
(
[col1],
[col2],
[col3],
[col4]
)
SELECT [col1],
[col2],
[col3],
[col4],
CURRENT_TIMESTAMP
FROM #PoorManCursor
WHERE [ID] >= @RowStart
AND [ID] <= @RowEnd
COMMIT TRANSACTION
PRINT CAST(@RowCount AS [VARCHAR])
+ ' records completed in this batch'
SET @RowDone = @RowDone + @RowCount
SET @RowStart = @RowStart + @BatchSize
SET @RowEnd = @RowEnd + @BatchSize
END
PRINT CAST(@RowDone AS [VARCHAR]) + ' total records processed.'
IF ( OBJECT_ID('[TempDB]..[#PoorManCursor]') IS NOT NULL )
DROP TABLE [#PoorManCursor]
END
I am deeply honored that I was chosen to present 2 sessions at SQL Saturday #96 in Washington, DC this Saturday, November 5, 2011!
Location is:
Microsoft MTC
5404 Wisconsin Ave, Suite 700
Chevy Chase, MD 20815
View Larger Map
Subway: Friendship Heights Metro Station
SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. Admittance to this event is free, all costs are covered by donations and sponsorships. Please register soon as seating is limited, and let friends and colleagues know about the event.
If you are attending please stop by and introduce yourself. Also if you are on Twitter don’t forget to use hashtag #sqlsat96 and see what others are doing.
The 2 sessions I will be presenting are :
Optimizing SQL Server I/O with Solid State Drives and Join operations – Explaining Merge,Hash and Loop
Hope to see you there!
Update: 11-09-2011
I have uploaded the slide decks to the session pages:
Thanks to all that attended and don’t forget to leave feedback at:
http://speakerrate.com/sensware
Recently I switched my ISP from AT&T DSL (I know … what was I thinking) back to Comcast. The process went thru very quick and in 30 minutes my old cable modem was hooked up and provisioned and was seeing 23 Mb/s speeds. with 11ms response time.
But… wait. As soon as I started using it I realized that a lot of web pages were taking forever to load and I quickly remembered the old Comcast issue. Unreliable/slow DNS servers. And you need the DNS response before you can actually start HTTP transfer and see the page.
There are a lot of free name servers so I will list a few:
OpenDNS: 208.67.222.222, 208.67.220.220 (my choice)
AT&T and Level 3: 4.2.2.1, 4.2.2.2, 4.2.2.3
Google :8.8.8.8, 8.8.4.4
I would recommend to set the Primary and Secondary from 2 different groups and make these changes on your router so any connected device will inherit the changes.
Problem solved. Happy browsing!
I was selected to present a session Back to basics – How does the Algebrizer work has been selected for SQL Saturday #82 in Indianapolis, IN on Saturday, June 25, 2011!
Location is:
University of Indianapolis-1400 E Hanna Ave
Indianapolis, IN 46227
View Larger Map
SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. Admittance to this event is free, all costs are covered by donations and sponsorships. Please register soon as seating is limited, and let friends and colleagues know about the event.
If you are attending please stop by and introduce yourself. Also if you are on Twitter don’t forget to use hashtag #sqlsat82 and see what others are doing.
Hope to see you there!
Last week Erin Stellato (blog| twitter) has asked a very interesting question on Twitter that triggered this post:
So how is it you can have good perf in Oracle, w/o IOTs, but in SQL Server, everyone says need CIs for good perf?
There are two aspects to that question
- What is the difference between index implementations in Oracle vs. SQL Server?
- How do you define good performance?

Until now I have not realized that this subject is not very well covered and since it involves diving into the internals I offered to make an attempt to fill this gap over the Memorial Day long weekend and soon realized that there is more to this than I originally thought it is. This will be the first post in a set that I am planning to write in the next weeks.
First, let’s begin by comparing the different types of indexes supported by Oracle and SQL Server:
So both Oracle and SQL Server support B-tree indexes which are ordered lists of key values, associated with the storage location of the table record that contains the respective value. So far it doesn’t seem to be a big difference between the two RDBMS. Both also support B+ tree indexes although there are differences between the two implementations. B+ trees are similar to B-trees but the table records are stored in the leaf nodes of the primary key index. This allows fast access for singleton searches (exact match) or range searches on the primary key of the table. Microsoft calls them clustered indexes (CI) while Oracle uses the term Index Organized Tables (IOT).
Let’s look at non-clustered indexes which are the default indexes in Oracle. Here the two implementations are quite similar: Indexes are using a B-tree structure that contains the key values and a unique reference to the storage location of the record. This is where the implementations start to separate:
In Oracle every record has a unique (inside the table) 10 byte pseudocolumn ROWID representing the physical location on disk that the record is stored. When printed, each ROWID is displayed in BASE64 format (A-Za-z0-9+/): AAAAaoAATAAABrXAAA
ROWID has a four-piece format, OOOOOOFFFBBBBBBRRR:
OOOOOO: The data object number that identifies the database segment (AAAAao in the example). Schema objects in the same segment, such as a cluster of tables, have the same data object number.
FFF: The tablespace-relative datafile number of the datafile that contains the row (file AAT in the example).
BBBBBB: The data block that contains the row (block AAABrX in the example). Block numbers are relative to their datafile, not tablespace. Therefore, two rows with identical block numbers could reside in two different datafiles of the same tablespace.
RRR: The row in the block.
Users can use ROWIDs directly for several functions:
- Rowids are the fastest way of accessing rows.
- Rowids can be used to see how a table is organized.
- Rowids uniquely identify rows in a given table.
There are two completely different perspectives when migrating from one RDBMS to another:
Most Oracle users have no concept of clustered or non clustered indexes because primary key indexes created in Oracle are balanced non-clustered indexes that include ROWID for fast access.
A common mistake that Oracle users make is assume that the indexes are implemented the same way in SQL server and underestimate the performance hit that comes with using a large composite primary key in SQL server on a Clustered table. It takes a while to understand that the non clustered index size index size in SQL server can be exponentially bigger in SQL server and is not necesarly proportional to the size of the columns specified in the non clustered index. Based on this, some Oracle experts say that the clustered indexes in SQL Sever have a 110% overhead because index key is stored “twice”, once with the index entry in the intermediate index nodes – and again with the row data at the leaf level.
Another performance hit can come from the default value of the free space (PCTFREE) that is left on a page as Oracle leave by default 10% space on each block (at 91% block is split). After a data block is filled to the limit determined by PCTFREE, Oracle considers the block unavailable for the insertion of new rows until the percentage of that block falls beneath the parameter PCTUSED.
The alternative of using a non clustered table (heap) is not recommended in most cases and Paul Randal ( blog | Twitter ) has some excellent posts on this subject as well as Erin’s recent blog post on Internals and Deletes for SQL University which I highly recommend reading.
On the opposite camp, most SQL Server users are looking for an alternative to clustered indexes and end up choosing to use Index Optimized Tables (IOT) instead of using simple tables just because they search for something that has the same structure as clustered tables in SQL Server in order to have comparable performance.
In the next part of my series I will continue with access methods in Oracle, and other index types.
Welcome to the first Meme Monday! This idea was brought to life by Thomas Larock (Blog|Twitter).
Here’s mine
Real DBAs never use DBCC SHRINKDATABASE.Because they know better!
I got this in the email a while ago and looking at it again, I had to post it because it is too funny.
Pessimist: “The glass is half empty.”
Optimist: “The glass is half full.”
Database Administrator: “The glass is sized correctly for peak loads.”
Virtualization Admin: “This person should be sharing a Dixie Cup with the guys in accounting.”
Business Intelligence Admin: “How can we upsell this customer to bottled water?”
Security Admin: “Whose glass is this? Half of the water has gone missing.”
Storage Admin: “This glass needs thin provisioning.”
ETL Admin: “We need a faster way to fill this glass.”
Active Directory Admin: “We need a secondary glass for redundancy.”
Backup Admin: “Nobody told me to take a snapshot of the full glass.”
Break Fix Admin: “Hang on, I’ll go get you another glass.”
Linux Admin: “I bet I could build a better glass myself.”
Project Manager: “I need a FTE for 15 minutes to accomplish the deliverable of a full glass.”
Azure Developer: “Why buy your own glass? The clouds are full of water.”
jQuery Developer: “When you start drinking, it automatically refills.”
Mac User: “Check out how much thinner and smaller my glass is, and it holds that much water too.”
Twitter: “The faucet is over capacity. Please wait a moment and try again.”
Today Mladen Prajdic posted a on Twitter a SQL statement that captured my attention: (http://twitter.com/MladenPrajdic/status/9878219727)
SELECT
CONVERT(varchar(50), N'æøåáäĺćçčéđńőöřůýţžš')
COLLATE Cyrillic_General_CI_AI
returns aoaaalcccednooruytzs
If you notice the conversion to Cyrillic_General_CI_AI collation his shows an interesting mechanism for code page translation that we could not explain:
Why does Cyrillic_General_CI_AI or Greek_CI_AI collation successfully convert accents to their base ascii chars?
I found an answer to this on this KB page: http://support.microsoft.com/kb/153449
What Happens when a Character on One Code Page is Unavailable on Another
If a conversion occurs and a character on one code page is unavailable on another, the character gets converted to its closest equivalent character on the other code page. For example, ASCII 224 (“alpha”) on code page 437 is converted to 97(“a”) on the ANSI code page. When you convert it back to code page 437, you receive 97(“a”). This is a problem, because you have now lost the original 224(“alpha”). This can potentially cause incorrect data to be saved on your server, as happens in the following two examples:
Example 1:
You have an OEM server in which you are inserting data from an ANSI client, and your AutoAnsiToOem option is enabled. If the character that you are inserting is valid in both code pages, the correct data will be saved. However, if the character is valid in ANSI but not in OEM, the correct data will not be saved.
Example 2:
You insert data into an ANSI server from an OEM client, and the AutoAnsiToOem option is enabled. If the character that you are inserting is valid in both code pages, the correct data will be saved. However, if the character is valid in OEM and not in ANSI, then correct data will not be saved.
Nevertheless I this is very useful to exotic strip accents from strings (i.e before an export).
After attending a presentation on SQL Azure by Microsoft’s George Huey an interesting topic came to my mind in regards to the actual use for a cloud based database that has over 1400 incompatibilities with SQL Server: What’s the use for this since it is not a viable alternative to a production database.
I would say that it is an alternative to MS Access!!!
Let me explain that. I definitely see the use for it when you need to deploy a solution quickly for an event or conference and you need a reliable solution for the backend of your application that fits a tight budget and an even tighter schedule or even as George suggested for a startup that has no money for a datacenter hosted db environment.
in these situations deploying a 1GB database to host non-confidential information for under $10/mth seems a no-brainer. But I would not recommend this when HIPAA compliance is required for instance.
My thanks to George for a great presentation.
|
|