SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing

April 13, 2012 at 7:00 AMSteven Wang

The new xVelocity memory optimized columnstore index feature in SQL server 2012 are designed to deliver blazing-fast performance for large data warehouse quries. But I believe that for many people who are interested in this fantastic new feature might have a problem to find a big sample data warehouse database to try the new columnstore features, like batch-mode processing, segment elimination, the size of memory grant etc.

The new sample data warehouse database: AdventureworksDW2012, which you can download from msftdbprodsamples.codeplex.com is not big enough for us to experience many features of columnstore index. We need to make it bigger to explore the columnstore index features.

Once the AdventureworksDW2012 Data file is downloaded, use the below script to attach the database:

USE [master]
GO
Declare @Filepath nvarchar(128)
Declare @SQL nvarchar(max)

Set	@Filepath = 'H:\Applications\database\SQL 2012\SampleDBs\adventureworks\' 
				--Change the path to your file path
				--Where your downloaded AdventureWorksDW2012_Data.mdf is located

Set @Filepath = IIF(Right(@Filepath,1) = '\', @Filepath, @Filepath + '\') + 'AdventureWorksDW2012_Data.mdf'

Set	@SQL = 'CREATE DATABASE [AdventureWorksDW2012] ON ( FILENAME = ''' + @Filepath + ''') FOR ATTACH'

Exec (@SQL)
GO

 

THe below diagram shows the table size of the AdventureworksDW2012 sample database: 

The biggest table in AdventureworksDW2012 is FactProductInventory with 776286 rows. The table is too narrow to be used for columnstore index testing. and the row size below 1 million is also too small.

A good candidate for creating the nonclustered columnstore index on will be FactResellerSales. But this table has only got 60,855 rows. In order to get a good performance comparison between the B-Tree index and columnstore index and other columnstore features , a table with about 100 millions of rows will be ideal. However, in most case, we might don't have such capacity in our laptop or home computer. I decide to pump up this table to over 10 millions of rows as this size will be suitable for most of users and it is also good enough for us to experience the most of columnstore indexes.

First of all, we don't want to simply duplicate the existing rows as the repeating data in the table will skew the view of the query performance we get from the columnstore index. By analyzing the FActResellserSales table, the easiest way is to bigger the DimReseller table as a particualar salesordernumber in the table is bound to a reseller. I will repeat the each salesorder line (salesorder number + salesorderlinenumber is the primary key of FactSalestable) 180 times with random picked reseller and randomized order quantity to pump up the table to 10,953,900 rows.

As originally the DimReseller has only 701 rows, I will first make a bigger DimReseller table. The below script will make a DimResellerBig table which is 50 times bigger than DimReseller table:

USE [AdventureworksDW2012]
GO
--Create a bigger table for DimReseller
SELECT 
		Cast(Row_Number() Over(Order by (Select 0)) as int) As [ResellerKey]
		, A.[GeographyKey]
		, Cast('AW' + Format(Row_Number() Over(Order by (Select 0)), '0000000000') as Nvarchar(15)) As [ResellerAlternateKey]
		, A.[Phone]
		, A.[BusinessType]
		, Cast(A.[ResellerName] +  Format(Row_Number() Over(Order by (Select 0)), ' 0000000000') As nvarchar(50)) As ResellerName
		, A.[NumberEmployees]
		, A.[OrderFrequency]
		, A.[OrderMonth]
		, A.[FirstOrderYear]
		, A.[LastOrderYear]
		, A.[ProductLine]
		, A.[AddressLine1]
		, A.[AddressLine2]
		, A.[AnnualSales]
		, A.[BankName]
		, A.[MinPaymentType]
		, A.[MinPaymentAmount]
		, A.[AnnualRevenue]
		, A.[YearOpened]
Into	DimResellerBig
FROM	[dbo].[DimReseller] A
			Cross Join
		Master..spt_values B
Where	B.Type = 'P'
			And
		B.Number Between 1 and 50
GO


--Add indexes and constraints for DimResellerBig
Alter Table dbo.DimResellerBig Alter Column [ResellerKey] Int Not Null;

ALTER TABLE [dbo].[DimResellerBig] ADD  CONSTRAINT [PK_DimResellerBIG_ResellerKey] PRIMARY KEY CLUSTERED 
(
	[ResellerKey] ASC
);

ALTER TABLE [dbo].[DimResellerBig] ADD  CONSTRAINT [AK_DimResellerBig_ResellerAlternateKey] UNIQUE NONCLUSTERED 
(
	[ResellerAlternateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_DimResellerBig_GeographyKey] ON [dbo].[DimResellerBig]
(
	[GeographyKey] ASC
);

GO

 

Once I have the DimResellerBig table, I can use the below script to make my big FactResellerSalesBig table:

USE [AdventureWorksDW2012]
GO
--The script is used to make a bigger FactResellerSales table, called FactResellerSalesBig.
--When a RowMultiplier 180 is used the FactResellerSalesBig table will be 10,953,900 rows
--The query takes 25 minutes to run in my laptop with a 4-core processor, a 7200 rpm hard disk and 8 GB RAM


Declare		@OrderCountTable Table (OrderDateKey INT, OrderCount Int)
Declare		@OrderDateKey Int = 2012
			, @OrderCount int
			, @RowMultiplier int = 180	--180 times more rows than the original FactResellerSales Table
										--you can make it bigger or smaller by change this number

Insert Into	@OrderCountTable
SELECT		Distinct [OrderDateKey]
			, Count(Distinct SalesOrderNumber) As OrderCount
FROM		[dbo].[FactResellerSales]
Group by	[OrderDateKey];


If Object_ID('[dbo].[FactResellerSalesBig]') Is Not Null
		Drop Table [dbo].[FactResellerSalesBig];

Select	Top(0) *
Into	FactResellerSalesBig
From	[dbo].[FactResellerSales];

While Exists (Select * From @OrderCountTable)
	Begin
		Select	Top(1) 
				@OrderDateKey = OrderDateKey
				,  @OrderCount =  OrderCount
		From	@OrderCountTable;

		Insert into FactResellerSalesBig with(Tablock)
		Select
				R.[ProductKey]
				, R.[OrderDateKey]
				, R.[DueDateKey]
				, R.[ShipDateKey]
				, Y.[ResellerKey]
				, R.[EmployeeKey]
				, R.[PromotionKey]
				, R.[CurrencyKey]
				, Y.[SalesTerritoryKey]
				, Cast(R.[SalesOrderNumber] + Format(Y.RowNum, '000') AS nvarchar(20)) As SalesOrderNumber
				, R.[SalesOrderLineNumber]
				, R.[RevisionNumber]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) As OrderQuantity
				, R.[UnitPrice]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] As ExtendedAmount
				, R.[UnitPriceDiscountPct]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * R.[UnitPriceDiscountPct] As DiscountAmount
				, R.[ProductStandardCost]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) *  R.[ProductStandardCost] As TotalProductCost
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * (1 - R.[UnitPriceDiscountPct]) As SalesAmount
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * (1 - R.[UnitPriceDiscountPct]) * 0.08 As TaxAmt
				, R.[Freight]
				, Cast(R.[CarrierTrackingNumber] +  Format(Y.RowNum, '-000') As nvarchar(25)) As CarrierTrackingNumber
				, Cast(R.[CustomerPONumber] + Format(Y.RowNum, '000') AS nvarchar(25)) As CustomerPONumber
				, R.[OrderDate]
				, R.[DueDate]
				, R.[ShipDate]
		From
			(
				SELECT 
						[ProductKey]
						,[OrderDateKey]
						,[DueDateKey]
						,[ShipDateKey]
						,[ResellerKey]
						,[EmployeeKey]
						,[PromotionKey]
						,[CurrencyKey]
						,[SalesTerritoryKey]
						,[SalesOrderNumber]
						,[SalesOrderLineNumber]
						,[RevisionNumber]
						,[OrderQuantity]
						,[UnitPrice]
						,[ExtendedAmount]
						,[UnitPriceDiscountPct]
						,[DiscountAmount]
						,[ProductStandardCost]
						,[TotalProductCost]
						,[SalesAmount]
						,[TaxAmt]
						,[Freight]
						,[CarrierTrackingNumber]
						,[CustomerPONumber]
						,[OrderDate]
						,[DueDate]
						,[ShipDate]
						, Dense_Rank() Over(Partition by [OrderDateKey] Order by SalesOrderNumber) As OrderNumber
				FROM	[dbo].[FactResellerSales] 
				Where	OrderDateKey =  @OrderDateKey
			) R
					Cross Apply
			(

				SELECT	TOP (@RowMultiplier) 
						A.[ResellerKey]
						, B.SalesTerritoryKey
						, Row_Number() Over(Order by Checksum(newid())) As RowNum
						, RAND(CHECKSUM(NEWID())) * 2 As QuantityMultiplier
				FROM	[DimResellerBig] A
							Inner join
						[dbo].[DimGeography] B
							on A.[GeographyKey] = B.GeographyKey
							Cross Join
						Master..spt_values C
				Where	C.Type = 'P'
							And
						C.Number Between 1 and @OrderCount
						and R.OrderNumber = C.number		
			) Y

		Print 'The records for the order date: ' + Cast(@OrderDateKey as nvarchar(8)) + ' has multiplied ' + Cast(@RowMultiplier as nvarchar(6)) + ' times';

		Delete Top(1) From	@OrderCountTable;

	End

Go

Create Clustered Index IX_FactResellerSalesBig_1 On [dbo].[FactResellerSalesBig] ([OrderDateKey] ASC, [ResellerKey] Asc);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_CurrencyKey] ON [dbo].[FactResellerSalesBig]
(
	[CurrencyKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_DueDateKey] ON [dbo].[FactResellerSalesBig]
(
	[DueDateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_EmployeeKey] ON [dbo].[FactResellerSalesBig]
(
	[EmployeeKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_ProductKey] ON [dbo].[FactResellerSalesBig]
(
	[ProductKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_PromotionKey] ON [dbo].[FactResellerSalesBig]
(
	[PromotionKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_ShipDateKey] ON [dbo].[FactResellerSalesBig]
(
	[ShipDateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_CarrierTrackingNumber] ON [dbo].[FactResellerSalesBig]
(
	[CarrierTrackingNumber] ASC
);


Create NonClustered Columnstore Index IX_FactResellerSalesBig_NCI On [dbo].[FactResellerSalesBig]
(
[ProductKey]
,[OrderDateKey]
,[DueDateKey]
,[ShipDateKey]
,[ResellerKey]
,[EmployeeKey]
,[PromotionKey]
,[CurrencyKey]
,[SalesTerritoryKey]
,[SalesOrderNumber]
,[SalesOrderLineNumber]
,[RevisionNumber]
,[OrderQuantity]
,[UnitPrice]
,[ExtendedAmount]
,[UnitPriceDiscountPct]
,[DiscountAmount]
,[ProductStandardCost]
,[TotalProductCost]
,[SalesAmount]
,[TaxAmt]
,[Freight]
,[CarrierTrackingNumber]
,[CustomerPONumber]
,[OrderDate]
,[DueDate]
,[ShipDate]
)

GO

 (Note: If you have problem to copy and paste the script into the SSMS, copy it and paste to Office Word first and then paste to SSMS.)

Now we have a bigger yet still meaningful by randomizing the reseller and quantity. The table size after 2 bigger tables are added in as below:

 

As you can see, now we have a big FactResellerSalesbig table with 10,953,900 rows and 2.6 GB data size. From next few of blogs, I'm going to dig into the columnstore index to see what make it different.

Link:

SQL Server 2012 Columnstore Index In Depth Part 2: Columnstore Index Size Calculation 

 

Posted in: Columnstore Index | SQL Server 2012

Tags: , ,

Comments (10) -

United States best massage chair says:

My partner and I stumbled over here  different website and thought I might as well check things out. I like what I see so i am just following you. Look forward to finding out about your web page repeatedly.|

Reply

United States go to this site says:

I simply want to say I'm beginner to blogging and site-building and certainly loved your blog site. Likely I’m want to bookmark your website . You amazingly have awesome articles and reviews. Many thanks for sharing with us your website page.

Reply

United States Matilda Laura says:

If you want to rank on top of search engines like our sample result, just use speed rank seo. Just google it.

Reply

United States Sara Houston says:

After reading your blog post I browsed your website a bit and noticed you arenot ranking nearly as well in Google as you could be. I possess a handful of blogs myself and I think you should take a look here: http://seoreportingtools.com You will find its a very nice tool that can bring you a lot more visitors. Keep up the quality posts.

Reply

United States Clara Nova says:

I just found your website on second page of serach engine, your website is very interesting and have good content. I possess a handful of website myself and I think you should take a look here: http://www.latestseotechniques.com You will find its a very nice forum that can give you access to all free seo techniques to get top rank on search engines. Keep up the quality posts.

Reply

I simply want to tell you that I am beginner to blogging and site-building and absolutely savored this web blog. Most likely I’m likely to bookmark your site . You certainly have fantastic article content. Thanks for sharing with us your website page.

Reply

United States click here now says:

I just want to tell you that I am just newbie to blogs and seriously savored you're web blog. Likely I’m want to bookmark your site . You actually have wonderful articles. Thanks for revealing your webpage.

Reply

United States read this article says:

I simply want to mention I am new to blogging and actually liked this website. Likely I’m want to bookmark your website . You surely have awesome posts. Thank you for sharing your website page.

Reply

Tester2017 says:

Hello,
I ran your script and it worked perfectly. Many thanks for sharing it!

I have one extra wish. Maybe you already did it since 2012?
I wish the dates columns of table [FactResellerSalesBig] would be uptodate, which means the fields  R.[OrderDateKey], R.[DueDateKey], R.[ShipDateKey], R.[OrderDate], R.[DueDate], R.[ShipDate] could be shifted in the future, using for example an additional parameter N for "number of years shifted".

Advantage : show "recent" data when presenting test prototypes. The Parameter N leaves the decision to the person, to which time in the future he wants to shift the data.

and these 6 columns whould be :
R.[OrderDateKey]+ N0000 etc.
This work for the type int columns, but not for the dates columns. Furthermore the bissextile property of a year has to be considered.
And even better, other tables may be worth to be shifted in the future of the same number of days as well Smile

FactProductInventory
FactResellerSales
FactInternetSales
FactFinance
FactCurrencyRate
NewFactCurrencyRate
FactSalesQuota
FactCallCenter


This would be great !
Regards,

Reply

Tester2017 says:

sorry I mean (i forgot one zero):
R.[OrderDateKey]+ N00000 etc.

Reply

Pingbacks and trackbacks (1)+

Add comment