Wednesday, September 15, 2010

One big cube or multiple cubes?

Found a good article by a Chris Webb (SQL Server MVP) who knows quite a bit about analysis services.  This answers the question of "One cube or Multiple?".   Good reading for bigger more complex AS setups.


http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!7728.entry

Tuesday, September 14, 2010

Speeding up scalar functions in SQL Server

Our warehouse uses a UDF to check for dates outside of the acceptable range and assign them the designated “unknown dates”.  Why do it in SQL you ask (vs SSIS)?  Because it is an easy way to have control in one central place.  I made a change that pushed the date ranges from hardcoded in a function to table driven so it can be shared across multiple clients and allow them to retain their custom settings during deployments.  It was like someone hit the breaks – sure the queries were very light (7 rows in the table), but calling it several times in each ETL sproc blew it out to 20-30 million times in a single ETL job. Sure the cache hit rate was 100%, but it still added almost 50% to the total ETL time.  What to do… what to do…

I ran across a great article by Itzik Ben-Gan (http://www.sqlmag.com/article/sql-server/inline-scalar-functions.aspx) on how to make your functions perform extremely fast – by simply changing them from scalar to table.  He goes into the details of why it works – I will simply state the benchmarks:

This is calling the function as a scalar with the syntax of:

[DateField] = dbo.fn_CleanDate([column])

CPU
Reads
Writes
Duration
491,918
62,832,254
1
493,181,940

Those are some big numbers…  so if we rewrite the function so that it returns a table with a single field, the changes resulting:

[DateField] = (select value from dbo.fn_SelectCleandate([column]))

CPU
Reads
Writes
Duration
330,176
41,901,480
0
330,524,577

Now for the kicker.  If you select your date values higher up in the sproc and pass them into the function so there is no data access required in the sub-select it performs.  Blazing fast.  Yeah, it might clutter up the syntax a bit, but it is definitely worth it!

[DateField] = (select value from dbo.fn_SelectCleanDate([column], @minDate, @maxDate,@minUnknownDate, @maxUnknownDate)),

CPU
Reads
Writes
Duration
1,670
4,578
0
308


That deserves restating.  308 milliseconds vs. 493 thousand.