Welcome to my blog, my main areas of Work and interest are SSIS and SQL, which also will be the main topics of my blogs.
You are always more than welcome to contact me or come by one of my talks
Needed a generic merge statement that would take two identical tables (target and a staging table) and merge them based on the primary key of the destination table, could not find one online, so here it is.
– The code can easily be changed to use the primary key of the source table if another key exits on the target
/* usp_GenericMerge @SourceSchema = 'etl', @SourceTable='Customers', @TargetSchema='Dim', @TargetTable='Customers' */ CREATE PROCEDURE usp_GenericMerge ( @SourceSchema sysname, @SourceTable sysname, @TargetSchema sysname, @TargetTable sysname, @Debug bit = 0 ) AS /* Merges two table based on the primary key of the Target table By Rasmus Reinholdt LinkedIn: dk.linkedin.com/in/rasmusreinholdt/ Twitter: @RasmusReinholdt All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. Modeled on http://www.purplefrogsystems.com/download/blog/GenerateMerge.sql */ --The variable to hold the dynamic sql is split as the endresult exceeds an nvarchar(max) DECLARE @MergeSQL NVARCHAR(MAX) DECLARE @UpdateSQL NVARCHAR(MAX) DECLARE @InsertSQL NVARCHAR(MAX) --Create Carriage return variable to help format the resulting query DECLARE @crlf char(2) SET @crlf = CHAR(13) DECLARE @Field varchar(255) --Cursor for primarykey columns DECLARE myCurPK Cursor FOR SELECT c.name FROM sys.columns c INNER JOIN sys.indexes i ON c.object_id = i.object_id INNER JOIN sys.index_columns IC ON IC.column_id = c.column_id AND IC.object_id = c.object_id AND i.index_id = IC.index_id WHERE OBJECT_NAME(c.OBJECT_ID) = @TargetTable AND i.is_primary_key = 1 --Cursor for columns to be updated (all columns except the primary key columns) DECLARE myCurUpdate CURSOR FOR SELECT c.name FROM sys.columns c left JOIN sys.index_columns IC ON IC.column_id = c.column_id AND IC.object_id = c.object_id AND IC.column_id = c.column_id left JOIN sys.indexes i ON i.object_id = ic.object_id AND i.index_id = IC.index_id WHERE OBJECT_NAME(c.OBJECT_ID) = @TargetTable AND ISNULL(i.is_primary_key,0) = 0 --Cursor for all columns, used for insert DECLARE myCurALL CURSOR FOR SELECT c.name FROM sys.columns c WHERE OBJECT_NAME(c.OBJECT_ID) = @TargetTable --Building the DynamicSQL SET @MergeSQL = ' MERGE ' + @TargetSchema + '.' + @TargetTable + ' AS Target USING ' + @SourceSchema +'.'+ @SourceTable + ' AS Source ON ' OPEN myCurPK FETCH NEXT FROM myCurPK INTO @Field IF (@@FETCH_STATUS>=0) BEGIN SET @MergeSQL = @MergeSQL + @crlf + ' SOURCE.' + @Field + ' = TARGET.' + @Field FETCH NEXT FROM myCurPK INTO @Field END WHILE (@@FETCH_STATUS<>-1) BEGIN IF (@@FETCH_STATUS<>-2) SET @MergeSQL = @MergeSQL + @crlf + ' AND SOURCE.' + @Field + ' = TARGET.' + @Field FETCH NEXT FROM myCurPK INTO @Field END CLOSE myCurPK SET @UpdateSQL = 'WHEN MATCHED ' + @crlf + ' THEN UPDATE SET' OPEN myCurUpdate FETCH NEXT FROM myCurUpdate INTO @Field IF (@@FETCH_STATUS>=0) BEGIN SET @UpdateSQL = @UpdateSQL + @crlf + ' ' + @Field + ' = SOURCE.' + @Field FETCH NEXT FROM myCurUpdate INTO @Field END WHILE (@@FETCH_STATUS<>-1) BEGIN IF (@@FETCH_STATUS<>-2) SET @UpdateSQL = @UpdateSQL + @crlf + ' ,' + @Field + ' = SOURCE.' + @Field FETCH NEXT FROM myCurUpdate INTO @Field END CLOSE myCurUpdate SET @InsertSQL = @crlf + 'WHEN NOT MATCHED THEN INSERT ( ' OPEN myCurALL FETCH NEXT FROM myCurALL INTO @Field IF (@@FETCH_STATUS>=0) BEGIN SET @InsertSQL = @InsertSQL + @crlf + ' ' + @Field FETCH NEXT FROM myCurALL INTO @Field END WHILE (@@FETCH_STATUS<>-1) BEGIN IF (@@FETCH_STATUS<>-2) SET @InsertSQL = @InsertSQL + @crlf + ' ,' + @Field FETCH NEXT FROM myCurALL INTO @Field END CLOSE myCurALL SET @InsertSQL = @InsertSQL + @crlf + ' )' SET @InsertSQL = @InsertSQL + @crlf + 'VALUES (' OPEN myCurALL FETCH NEXT FROM myCurALL INTO @Field IF (@@FETCH_STATUS>=0) BEGIN SET @InsertSQL = @InsertSQL + @crlf + ' SOURCE.' + @Field FETCH NEXT FROM myCurALL INTO @Field END WHILE (@@FETCH_STATUS<>-1) BEGIN IF (@@FETCH_STATUS<>-2) SET @InsertSQL = @InsertSQL + @crlf + ' , SOURCE.' + @Field FETCH NEXT FROM myCurALL INTO @Field END CLOSE myCurALL SET @InsertSQL = @InsertSQL + @crlf + ' ) ;' --clean up DEALLOCATE myCurUpdate DEALLOCATE myCurAll DEALLOCATE myCurPK IF @Debug = 0 BEGIN EXEC(@MergeSQL + @UpdateSQL + @InsertSQL) END ELSE BEGIN PRINT @MergeSQL PRINT @UpdateSQL PRINT @InsertSQL END
I needed to convert a stored procedure to native compiled in sql2014 in-memory tables, but since neither the merge nor the left join operators are supported in native stored procedure a twist of creativity was needed.
Inspired by http://msdn.microsoft.com/en-us/library/dn579375%28v=sql.120%29.aspx I decided on a row-by-row while loop that checks if the row exists in the destination table and in that case updates the row, if it doesn’t the row is added. Further cases can be added to the statements and more cases of updates etc appended to mimic even the most advanced merge statement. Due to the machine code compiling, this approach works at least as fast as a regular merge statement.
The code to merge two copies of a blown-up version of AdventureWorksDW2012 below, first example merges FactInternetSales, the second makes a type2 slowly changing dimension behavior merge of DimEmployees.
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [NativeSPs].[ImportSales] WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english' ) --varables needed to make the loop work declare @i int = 1, @max int select @max = count(id) from dsa.erp_sales –the number of loops to be maked declare @ProductKey nvarchar(250) ,@ResellerKey nvarchar(250) ,@EmployeeKey nvarchar(250) ,@CurrencyKey nvarchar(250) ,@SalesTerritoryKey nvarchar(250) ,@SalesOrderNumber nvarchar(250) ,@SalesOrderLineNumber tinyint ,@OrderQuantity smallint ,@UnitPrice money ,@CarrierTrackingNumber nvarchar(250) ,@OrderDate date ,@DueDate date ,@ShipDate date --starting the loop while @i <= @max begin --populating values for the row being treated select @ProductKey = ProductKey ,@ResellerKey = ResellerKey ,@EmployeeKey = EmployeeKey ,@CurrencyKey = CurrencyKey ,@SalesTerritoryKey = SalesTerritoryKey ,@SalesOrderNumber = SalesOrderNumber ,@SalesOrderLineNumber = SalesOrderLineNumber ,@OrderQuantity = OrderQuantity ,@UnitPrice = UnitPrice ,@CarrierTrackingNumber = CarrierTrackingNumber ,@OrderDate = OrderDate ,@DueDate = DueDate ,@ShipDate = ShipDate FROM dsa.erp_sales where id = @i --declaring variable to be used to check if row exits declare @LineExists bit = 0, @EDWID uniqueidentifier --check if row exists in the destination tables based on business keys SELECT TOP 1 @LineExists = 1, @EDWID = Id from edw.Sales WHERE SalesOrderNumber = @SalesOrderNumber AND SalesOrderLineNumber = @SalesOrderLineNumber --if row does not exists, it is added IF @LineExists = 0 BEGIN insert into edw.Sales (StatusType, ImportId, OrderQuantity, UnitPrice, DueDate, OrderDate, ShipDate, CarrierTrackingNumber, ProductId, SalesTerritoryId, CurrencyId, ProductOriginalKey, SalesTerritoryOriginalKey, CurrencyOriginalKey, SalesOrderLineNumber, SalesOrderNumber, EmployeeId, EmployeeOriginalKey, ResellerId, ResellerOriginalKey ) values (3,0, @OrderQuantity, @UnitPrice, @DueDate, @OrderDate, @ShipDate, @CarrierTrackingNumber,'18785F54-D73F-44D0-B88D-036DACFD6888', '18785F54-D73F-44D0-B88D-036DACFD6888' , '18785F54-D73F-44D0-B88D-036DACFD6888' , @ProductKey, @SalesTerritoryKey, @CurrencyKey, @SalesOrderLineNumber, @SalesOrderNumber, '18785F54-D73F-44D0-B88D-036DACFD6888' , @EmployeeKey, '18785F54-D73F-44D0-B88D-036DACFD6888', @ResellerKey) END --if row exists, it is updated IF @LineExists = 1 BEGIN UPDATE edw.Sales SET OrderQuantity = @OrderQuantity , UnitPrice = @UnitPrice , DueDate = @DueDate , OrderDate = @OrderDate , ShipDate = @ShipDate , CarrierTrackingNumber = @CarrierTrackingNumber , ProductOriginalKey = @ProductKey , SalesTerritoryOriginalKey = @SalesTerritoryKey , CurrencyOriginalKey = @CurrencyKey , EmployeeOriginalKey = @EmployeeKey , ResellerOriginalKey = @ResellerKey WHERE Id = @EDWID END --increment the value of @i and restart the loop set @i = @i+1 END END
The statement can, as mentioned, be elaborated further to allow different cases of updating, the relevant part of such an example below. This cased is used to make a type2 slowly changing dimension
--check if row exists in the destination tables based on business keys SELECT TOP 1 @LineExists = 1, @EDWID = Id, @EDWParentKey = ParentBusinessKey from edw.Employee WHERE BusinessKey = @BusinessKey --If the line does not exits OR line exits but with a different parentkey the line is added IF (@LineExists = 0 OR (@LineExists = 1 AND IsNull(@EDWParentKey COLLATE Latin1_General_100_BIN2,'NA' COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey COLLATE Latin1_General_100_BIN2,'NA' COLLATE Latin1_General_100_BIN2))) BEGIN insert into edw.Employee ([ParentBusinessKey], [DisplayName], [ImportId], [IsCurrent], [StartDate], [BusinessKey]) values (@ParentBusinessKey, @DisplayName, 0,1, cast(getdate() as date),@BusinessKey ) END --If the line exits AND parentkey is equal to the incoming parent key the name is updated IF (@LineExists = 1 AND IsNull(@EDWParentKey COLLATE Latin1_General_100_BIN2,'NA' COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey COLLATE Latin1_General_100_BIN2,'NA' COLLATE Latin1_General_100_BIN2)) BEGIN UPDATE edw.Employee SET [DisplayName] = @DisplayName WHERE Id = @EDWID END --If the line exits but with a different parentkey the old line id marked with an expiration date IF (@LineExists = 1 AND IsNull(@EDWParentKey COLLATE Latin1_General_100_BIN2,'NA' COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey COLLATE Latin1_General_100_BIN2,'NA' COLLATE Latin1_General_100_BIN2)) BEGIN UPDATE edw.Employee SET [EndDate] = cast(getdate() as date) WHERE Id = @EDWID END
As the procedure is compiled to machine code this is lightning fast, a million rows takes 12 sec to run through the procedure on a small machine whereas executing the loop as SQL on the in-memory tables takes 3+ minutes, so a great deal of the performance gain come from compiling the code to native stored procedures.