New Beginings – Big Data to Find a Cure for Healthcare

I know it’s been a while since my last post  here.
Next week is PASS Summit and it will mark 3 years since my last  session at SQL PASS Summit 2015. I have learned a lot since then and joined a  new company as Data Architect to help build a new web analytics and audience identity product. I shifted my focus from SQL Server on premise to the AWS Cloud. I learned a lot about AWS, migrated instances to SQL Server on RDS as well as traditional EC2 instances and then shifted into Big Data on AWS, working with an amazing team to build a datalake for the new product I helped develop in the last 3 years. Earlier this year I passed the exam to become a  AWS Certified Solution Architect – Associate and I’m working on getting the Big Data – Specialty Certification.

But next week marks a new chapter in my life and I cannot contain my excitement. I will be joining a leading healthcare decision support technology company as their Big Data Architect and work with another great team to build a cloud product that I hope will help solve one of the biggest problems in America – The cost of healthcare.  Needless to say that  this is a subject I am extremely passionate about. The greatest country in the world has a deep fundamental problem – the majority of healthcare providers both individuals and corporations don’t know how much a medical procedure should cost. This means that if you ask a doctor before a procedure how much will it cost the answer will be “I DON’T KNOW”.  We developed an industry that has providers on one side wanting to charge more to cover their costs and be profitable  because they do not have detailed information and on the other side medical insurance wanting to negotiate the same cost down to stay competitive. But regardless if you are insured, uninsured or underinsured the costs of healthcare will shift towards those who pay … the patients and their employer,

Healthcare is complex
Healthcare is complex

In all honesty, accurately estimating the cost of helthcare  is not a simple problem and I believe it is a global problem. Even if most countries have standardized costs for every procedure everyone is different. The cost is just an average cost that does not account for factors like equipment maintenance cost, healthcare professional’s experience,  legal costs or unknown side effects that can occur following the procedure because everyone is different. Just like any average that can change year over year and it does. If you approximate your car expenses based on the average maintenance and fuel cost of the first 3 years of owning  your first car and then you buy a more expensive models you would expect the cost to change and it would not be accurate to use the old car even with some padding to estimate the new cost.  Imagine a world where you can use a business inteligence tools to analyse all the different costs in real time and gain insights that can help you rapidly  make decisions to save money , budget more accurately  and maybe ultimately improve the outcomes and lives of patients.  Can you tell I’m excited?

I promise I will start to write more about technologies I’ve used, healthcare costs and othe things I’ve learned as well as new technologies that I have yet to discover and implement.

A tale of two cities : Presenting Anatomy of a Join at SQL Saturday #160 and #149

Next 10 days will be pretty busy as  I will be presenting  my session “Anatomy of a Join” at  SQL Saturday #160  in Kalamazoo, MI on Saturday September 22, 2012 and the following week at SQL Saturday #149  in Minneapolis, MN on Saturday September 29, 2012

SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. Admittance to this event is free, all costs are covered by donations and sponsorships. Please register soon as seating is limited, and let friends and colleagues know about the event.
If you are attending please stop by and introduce yourself. Also if you are on Twitter don’t forget to use hashtag #sqlsat160 For Kalamazoo or #sqlsat149 for Minneapolis and see what others are doing.

Hope to see you there!

Presenting at SQL Saturday #119 Chicago

I will be presenting  my session on “Optimizing SQL Server I/O with Solid State Drives” at  SQL Saturday #119 in Chicago on Saturday May 19, 2012!

Since this is the 4th time I’m presenting this and I kept accumulating a lot of ideas related to Solid State Drives I decided to start a series of One SSD Technology a Day. Stay tuned!

Location is:
DeVry University – Addison Campus,
1221 North Swift Road,  Addison, IL 60101-6106

View Larger Map

SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. Admittance to this event is free, all costs are covered by donations and sponsorships. Please register soon as seating is limited, and let friends and colleagues know about the event.
If you are attending please stop by and introduce yourself. Also if you are on Twitter don’t forget to use hashtag #sqlsat119 and see what others are doing.

Hope to see you there!

Weekly links for 11-21-2011

SQL Server 2012 Release Candidate is now Available from SQL Server Team blog

The Data Scientist from BuckWoody

7 Scaling Strategies Facebook Used to Grow to 500 Million Users  HighScalabilty

10 Sloppy Social Media Mistakes to Fix NOW from Hubspot Blog

What is Hadoop? And Why is it Good for Big Data?  from The Data Roundtable

Hacker Says Texas Town Used Three Character Password To Secure Internet Facing SCADA System from ThreatPost

The AdventureWorks2008R2 Books Online Random Workload Generator from Jonathan Kehayias

 

 

We are family!

This is the second time I’m writing a post in the  “Meme Monday” series and this time it originates from Tom LaRock (blog | twitter).Tom makes coffee The topic is : What #SQLfamily means to me.

This weekend I participated in a #SQLfamily reunion in Washington, DC. Some might argue that this is called SQL Saturday #96 but I prefer to see it as a family reunion because this is how it felt. It started with a #SQLfamily dinner where we had a good time told stories (some would even make a good Agatha Christie book)… but I digress. We all sat around the table and felt like we are part of a family. I met Tom LaRock, Joey Dantoni, Karen Lopez, Andy Leonard, Brian Moran, Aaron Nelson, Matt Velic, Chris Skorlinski, Wayne Sheffield, Sebastian Meine and many others. I truly felt like a family reunion and it was.
On Saturday we continued, had fun presenting our sessions, learning new things, hanging out in the speaker room and talking about all kinds of things. Everybody contributed to making this  #SQLfamily reunion an event to remember. Tom even offered to make coffee and but had no idea what to do with the extension cord. (…after the last power outage they had in New England you would think he should know better 🙂 )

Looking back I realize that every #SQLfamily event I have participated to has made me feel like this whether it was a SQL Saturday in Indianapolis or in Chicago or a user group meeting I never felt left out. I had my brothers and sisters and uncles and aunts and they made me feel like I am part of the #SQLfamily.

And the best part is that our #SQLfamily is growing every day. Every reunion we discover more people we can relate to even though we are not related (but in a way we are).

God bless our #SQLfamily!

How to: Poor man’s cursor template for batch processing

John Sterrett blog|twitter asked a question about “How do I loop by rowcount in SSIS? I have a table with 5 million IDs and would like to loop to process 10K at a time. #sqlhelp”

This cannot be done natively in SSIS but you could create a stored procedure that uses the concept known as “poor man’s cursor” to split it in batches.
I included a template of code that I use  to create smaller transactions that do not block normal users accessing the same tables

Let me know what you think.

/***************************************************************************************
 * Name:            Poor mans cursor example for batch processing
 * Author:            Luke Jian - luke(a.t)sqlpositive.com
 * Creation Date:    11/02/11
 * Purpose:            Template for batch processing using Poor man's cursor
 ***************************************************************************************/
    DECLARE @RowStart [INT]
    DECLARE @RowEnd [INT]
    DECLARE @RowToDo [INT]
    DECLARE @RowDone [INT]
    DECLARE @BatchSize [INT]
    BEGIN
    --    Load Users To Enable
    IF ( OBJECT_ID('[TempDB]..[#PoorManCursor]') IS NOT NULL )
    DROP TABLE [#PoorManCursor]
    SELECT --DISTINCT
    [ID] = IDENTITY ( [INT], 1, 1)
    --.. SOME other rows
    INTO    [#PoorManCursor]
    FROM    dbo.BaseTables
    --ORDER BY [Whatever column for batch order]
    --    Add index to temp table
    --    because table may easily exceed 10,000,000 rows
    CREATE UNIQUE CLUSTERED INDEX [UCPoorManCursorID] ON    [#PoorManCursor] ( [ID] )
    --    Set up batching
    SELECT  @RowStart = 1,
    @RowEnd = @BatchSize, --can use 10K
    @RowToDo = MAX([ID]),
    @RowDone = 0
    FROM    [#PoorManCursor]
    PRINT CAST(@RowToDo AS [VARCHAR]) + ' records to be processed.'
    --    Process batch
    WHILE ( @RowStart <= @RowToDo )
    BEGIN
    BEGIN TRANSACTION
    --        Load PoorManCursor_Tracking
    INSERT  INTO [dbo].[SomeTargetTable]
    (
    [col1],
    [col2],
    [col3],
    [col4]
    )
    SELECT  [col1],
    [col2],
    [col3],
    [col4],
    CURRENT_TIMESTAMP
    FROM    #PoorManCursor
    WHERE   [ID] >= @RowStart
    AND [ID] <= @RowEnd
    COMMIT TRANSACTION
    PRINT CAST(@RowCount AS [VARCHAR])
    + ' records completed in this batch'
    SET @RowDone = @RowDone + @RowCount
    SET @RowStart = @RowStart + @BatchSize
    SET @RowEnd = @RowEnd + @BatchSize
    END
    PRINT CAST(@RowDone AS [VARCHAR]) + ' total records processed.'
    IF ( OBJECT_ID('[TempDB]..[#PoorManCursor]') IS NOT NULL )
    DROP TABLE [#PoorManCursor]
    END

Presenting 2 sessions at SQL Saturday #96 in Washington, DC

I am deeply honored that I was chosen to present 2 sessions at SQL Saturday #96 in Washington, DC this Saturday,   November 5, 2011!

Location is:
Microsoft MTC
5404 Wisconsin Ave, Suite 700
Chevy Chase, MD 20815


View Larger Map

Subway: Friendship Heights Metro Station

SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. Admittance to this event is free, all costs are covered by donations and sponsorships. Please register soon as seating is limited, and let friends and colleagues know about the event.
If you are attending please stop by and introduce yourself. Also if you are on Twitter don’t forget to use hashtag #sqlsat96 and see what others are doing.

The 2 sessions I will be presenting are :

Optimizing SQL Server I/O with Solid State Drives and Join operations – Explaining Merge,Hash and Loop

Hope to see you there!

 

Update: 11-09-2011

I have uploaded the slide decks to the session pages:

Thanks to all that attended and don’t forget to leave feedback at:
http://speakerrate.com/sensware

How to improve your Comcast response time?

Recently I switched my ISP from AT&T DSL (I know … what was I thinking) back to Comcast. The process went thru very quick and in 30 minutes my old cable modem was hooked up and provisioned and was seeing 23 Mb/s speeds. with 11ms response time.
But… wait. As soon as I started using it I realized that a lot of web pages were taking forever to load and I quickly remembered the old  Comcast issue. Unreliable/slow DNS servers. And you need the DNS response before you can actually start HTTP transfer and see the page.

There are a lot of free name servers so I will list a few:

OpenDNS: 208.67.222.222, 208.67.220.220  (my choice)
AT&T and Level 3: 4.2.2.1, 4.2.2.2, 4.2.2.3
Google :8.8.8.8, 8.8.4.4

I would  recommend to set the Primary and Secondary from 2 different groups and make these changes on your router so any connected device will inherit the changes.

Problem solved. Happy browsing!

What’s the big difference between Oracle and SQL Server indexes (Part I)

This entry is part 1 of 1 in the series SQL Server vs. Oracle

Last week Erin Stellato (blog| twitter) has asked a very interesting question on Twitter that triggered this post:

So how is it you can have good perf in Oracle, w/o IOTs, but in SQL Server, everyone says need CIs for good perf?

There are two aspects to that question

  1. What is the difference between index implementations in Oracle vs. SQL Server?
  2. How do you define good performance?Peeling an Onion

Until now I have not realized that this subject is not very well covered and since it involves diving into the internals  I offered to make an attempt to fill this gap over the Memorial Day long weekend and soon realized that there is more to this than I originally thought it is.  This will be the first post in a set that I am planning to write in the next weeks.

First, let’s begin by comparing the different types of indexes supported by Oracle and SQL Server:

Type of index SQL  Server 2008 R2 Oracle 11g
B-tree indexes Yes (Non-clustered) Yes (Default)
B+ tree indexes Yes (Clustered Indexes) Yes (IOT)
Function Based Indexes Yes (Column has to exist in table) Yes
Filtered Indexes Yes Yes
Hash cluster indexes No Yes
B-Tree cluster indexes No Yes
Bitmap Indexes No Yes

So both Oracle and SQL Server support B-tree indexes which are ordered lists of key values, associated with the storage location of the table record that contains the respective value. So far it doesn’t seem to be a big difference between the two RDBMS. Both also support B+ tree indexes although there are differences between the two implementations. B+ trees are similar to B-trees but the table records are stored in the leaf nodes of the primary key index. This allows  fast access for singleton searches (exact match) or range searches on the primary key of the table.  Microsoft calls them clustered indexes (CI) while Oracle uses the term Index Organized Tables (IOT).

Let’s look at non-clustered indexes which are the default indexes in Oracle. Here the two implementations are quite similar: Indexes are using a B-tree structure that contains  the key values and a unique reference to the storage location of the record. This is where the implementations start to separate:

In Oracle every record has a unique (inside the table) 10 byte pseudocolumn ROWID representing the physical location on disk that the record is stored. When printed, each ROWID is displayed in BASE64 format (A-Za-z0-9+/): AAAAaoAATAAABrXAAA
ROWID has a four-piece format, OOOOOOFFFBBBBBBRRR:

  • OOOOOO: The data object number that identifies the database segment (AAAAao in the example). Schema objects in the same segment, such as a cluster of tables, have the same data object number.
  • FFF: The tablespace-relative datafile number of the datafile that contains the row (file AAT in the example).
  • BBBBBB: The data block that contains the row (block AAABrX in the example). Block numbers are relative to their datafile, not tablespace. Therefore, two rows with identical block numbers could reside in two different datafiles of the same tablespace.
  • RRR: The row in the block.

Users can use ROWIDs directly for several functions:

  • Rowids are the fastest way of accessing  rows.
  • Rowids can be used to see how a table is organized.
  • Rowids uniquely identify rows in a given table.

There are two completely different perspectives when migrating from one RDBMS to another:
Most Oracle users have no concept of clustered or non clustered indexes because primary key indexes created in Oracle are balanced non-clustered indexes that include ROWID for fast access.

A common mistake that Oracle users make is assume that  the indexes are implemented the same way  in SQL server and underestimate the performance hit that comes with using a large composite primary key in SQL server on a Clustered table. It takes a while to understand that the non clustered index size  index size in SQL server can be exponentially bigger in SQL server and is not necesarly proportional to the size of the columns specified in the non clustered index. Based on this, some Oracle experts say that the clustered indexes in SQL Sever have a 110% overhead because index key is stored “twice”, once with the index entry in the intermediate index nodes – and again with the row data at the leaf level.

Another performance hit can come from the  default value of the free space (PCTFREE) that is left on a page  as Oracle leave by default 10% space on each block (at 91% block is split). After a data block is filled to the limit determined by PCTFREE, Oracle considers the block unavailable for the insertion of new rows until the percentage of that block falls beneath the parameter PCTUSED.

The alternative of using a non clustered table (heap)  is not recommended in most cases  and Paul Randal ( blog | Twitter )  has some excellent posts on this subject  as well as Erin’s recent blog post on Internals and Deletes for SQL University which I highly recommend reading.

On the opposite camp, most SQL Server users are looking for an alternative to clustered indexes and end up choosing to use Index Optimized Tables (IOT) instead of using simple tables just because they search for something that has the same structure as clustered tables in SQL Server in order to have comparable performance.

In the next part of my series I will continue with access methods in Oracle, and other index types.

Meme Monday

Welcome to the first Meme Monday! This idea was brought to life by Thomas Larock (Blog|Twitter).

Here’s mine

Real DBAs never use DBCC SHRINKDATABASE.Because they know better!

Scorpions – Last album

For those of us that com from an Eastern European country, Scorpions is a legend. Their songs have made us strong and gave us the force to go forward like a “Wind of change”. But sadly they have announced that  their newest album: “Sting in the Tail” Is also their last.

Thank you for keeping us energized for so many decades!

Important Message to our Fans
It was always our pleasure, our purpose in life, our passion and we were fortunate enough to make music for you – whether it was live on stage or in the studio, creating new songs.
While we were working on our album these past few months, we could literally feel how powerful and creative our work was – and how much fun we were still having, in the process. But there was also something else: We want to end the Scorpion’s extraordinary career on a high note. We are extremely grateful for the fact that we still have the same passion for music we’ve always had since the beginning. This is why, especially now, we agree we have reached the end of the road. We finish our career with an album we consider to be one of the best we have ever recorded and with a tour that will start in our home country Germany and take us to five different continents over the next few years.
We want you, our fans, to be the first to know about this. Thank you for your never-ending support throughout the years!

http://www.the-scorpions.com/english/

Marrons Chauds, Marrons Chauds

Tonight we started the fire in the outdoor fireplace enjoyed the warmth of the fire and in the end I roasted chestnuts in the hot embers. It’s funny how memories are triggered by simple things like a smell or an object.

Source: Flickr
Source: Flickr

All of a sudden while roasting the chestnuts my mind wandered back in time when I was a student in Paris. This time of year once the cold weather comes a lot of street vendors start selling roasted chestnuts. They yell: “Marrons chauds! Marrons Chauds!” trying to get people to buy their chestnuts.  This was a an interesting experience since it is the first time I’m roasting chestnuts. Some got burned but most of them were delicious and we had a good time around the fire.