Author: Rasmus Reinholdt

Teamleader and Managing Consultant (BI) at Rehfeld Partners A/S, speaker, (virgin) blogger and active in the SQL community. Primary areas of interest are SSIS and SQL

Generic Merge statement based on Primary-key

Needed a generic merge statement that would take two identical tables (target and a staging table) and merge them based on the primary key of the destination table, could not find one online, so here it is.

– The code can easily be changed to use the primary key of the source table if another key exits on the target


/*
 usp_GenericMerge @SourceSchema = 'etl', @SourceTable='Customers', @TargetSchema='Dim', @TargetTable='Customers'
 */
 CREATE PROCEDURE usp_GenericMerge (
 @SourceSchema sysname,
 @SourceTable sysname,
 @TargetSchema sysname,
 @TargetTable sysname,
 @Debug bit = 0
 )
 AS
/*
 Merges two table based on the primary key of the Target table
 
 By Rasmus Reinholdt
 LinkedIn: dk.linkedin.com/in/rasmusreinholdt/
 Twitter: @RasmusReinholdt

 All code samples are provided “AS IS” without warranty of
 any kind, either express or implied, including but not
 limited to the implied warranties of merchantability
 and/or fitness for a particular purpose.
 Modeled on http://www.purplefrogsystems.com/download/blog/GenerateMerge.sql
 */

--The variable to hold the dynamic sql is split as the endresult exceeds an nvarchar(max)
 DECLARE       @MergeSQL NVARCHAR(MAX)
 DECLARE       @UpdateSQL NVARCHAR(MAX)
 DECLARE       @InsertSQL NVARCHAR(MAX)

--Create Carriage return variable to help format the resulting query
 DECLARE @crlf char(2)
 SET @crlf = CHAR(13)
 DECLARE @Field varchar(255)

--Cursor for primarykey columns
 DECLARE myCurPK Cursor FOR
 SELECT c.name
 FROM sys.columns c
 INNER JOIN sys.indexes i 
  ON c.object_id = i.object_id
 INNER JOIN sys.index_columns IC 
  ON IC.column_id = c.column_id
  AND IC.object_id = c.object_id
  AND i.index_id = IC.index_id
 WHERE OBJECT_NAME(c.OBJECT_ID) = @TargetTable
  AND i.is_primary_key = 1

--Cursor for columns to be updated (all columns except the primary key columns)
 DECLARE myCurUpdate CURSOR FOR
 SELECT c.name
 FROM sys.columns c
 left JOIN sys.index_columns IC 
  ON IC.column_id = c.column_id
  AND IC.object_id = c.object_id
  AND IC.column_id = c.column_id
 left JOIN sys.indexes i ON i.object_id = ic.object_id
  AND i.index_id = IC.index_id
 WHERE OBJECT_NAME(c.OBJECT_ID) = @TargetTable
  AND ISNULL(i.is_primary_key,0) = 0 

--Cursor for all columns, used for insert
 DECLARE myCurALL CURSOR FOR
 SELECT c.name
 FROM sys.columns c
 WHERE OBJECT_NAME(c.OBJECT_ID) = @TargetTable

--Building the DynamicSQL
 SET @MergeSQL = '
 MERGE ' + @TargetSchema + '.' + @TargetTable + ' AS Target
 USING ' + @SourceSchema +'.'+ @SourceTable + ' AS Source
 ON '
 OPEN myCurPK
 FETCH NEXT FROM myCurPK INTO @Field
 IF (@@FETCH_STATUS>=0)
 BEGIN
 SET @MergeSQL = @MergeSQL + @crlf + '           SOURCE.' + @Field + ' = TARGET.'  + @Field
 FETCH NEXT FROM myCurPK INTO @Field
 END
 WHILE (@@FETCH_STATUS<>-1)
 BEGIN
 IF (@@FETCH_STATUS<>-2)
 SET @MergeSQL = @MergeSQL + @crlf + '           AND SOURCE.' + @Field + ' = TARGET.'  + @Field
 FETCH NEXT FROM myCurPK INTO @Field
 END
 CLOSE myCurPK

SET @UpdateSQL =  'WHEN MATCHED ' + @crlf + '
 THEN UPDATE SET'
 OPEN myCurUpdate
 FETCH NEXT FROM myCurUpdate INTO @Field
 IF (@@FETCH_STATUS>=0)
 BEGIN
 SET @UpdateSQL = @UpdateSQL + @crlf + '           ' + @Field + ' = SOURCE.'  + @Field
 FETCH NEXT FROM myCurUpdate INTO @Field
 END
 WHILE (@@FETCH_STATUS<>-1)
 BEGIN
 IF (@@FETCH_STATUS<>-2)
 SET @UpdateSQL = @UpdateSQL + @crlf + '           ,' + @Field + ' = SOURCE.'  + @Field
 FETCH NEXT FROM myCurUpdate INTO @Field
 END
 CLOSE myCurUpdate

SET @InsertSQL =  @crlf +
 'WHEN NOT MATCHED THEN
 INSERT (
 '
 OPEN myCurALL
 FETCH NEXT FROM myCurALL INTO @Field
 IF (@@FETCH_STATUS>=0)
 BEGIN
 SET @InsertSQL = @InsertSQL + @crlf + '           ' + @Field
 FETCH NEXT FROM myCurALL INTO @Field
 END
 WHILE (@@FETCH_STATUS<>-1)
 BEGIN
 IF (@@FETCH_STATUS<>-2)
 SET @InsertSQL = @InsertSQL + @crlf + '           ,' + @Field
 FETCH NEXT FROM myCurALL INTO @Field
 END
 CLOSE myCurALL

SET @InsertSQL =  @InsertSQL + @crlf + '
 )'

SET @InsertSQL =  @InsertSQL + @crlf +
 'VALUES ('
 OPEN myCurALL
 FETCH NEXT FROM myCurALL INTO @Field
 IF (@@FETCH_STATUS>=0)
 BEGIN
 SET @InsertSQL = @InsertSQL + @crlf + '           SOURCE.' + @Field
 FETCH NEXT FROM myCurALL INTO @Field
 END
 WHILE (@@FETCH_STATUS<>-1)
 BEGIN
 IF (@@FETCH_STATUS<>-2)
 SET @InsertSQL = @InsertSQL + @crlf + '           , SOURCE.' + @Field
 FETCH NEXT FROM myCurALL INTO @Field
 END
 CLOSE myCurALL
 SET @InsertSQL =  @InsertSQL + @crlf +
 '      )
 ;'

--clean up
 DEALLOCATE myCurUpdate
 DEALLOCATE myCurAll
 DEALLOCATE myCurPK

IF @Debug = 0
 BEGIN
 EXEC(@MergeSQL + @UpdateSQL + @InsertSQL)
 END
 ELSE
 BEGIN
 PRINT @MergeSQL
 PRINT @UpdateSQL
 PRINT @InsertSQL
 END
Advertisements

Mimicking a merge statement in Native compiled stored procedures in SQL 2014 in-memory tables

I needed to convert a stored procedure to native compiled in sql2014 in-memory tables, but since neither the merge nor the left join operators are supported in native stored procedure a twist of creativity was needed.

Inspired by http://msdn.microsoft.com/en-us/library/dn579375%28v=sql.120%29.aspx I decided on a row-by-row while loop that checks if the row exists in the destination table and in that case updates the row, if it doesn’t the row is added. Further cases can be added to the statements and more cases of updates etc appended to mimic even the most advanced merge statement.  Due to the machine code compiling, this approach works at least as fast as a regular merge statement.

The code to merge two copies of a blown-up version of  AdventureWorksDW2012 below, first example merges FactInternetSales, the second makes a type2 slowly changing dimension behavior merge of DimEmployees.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [NativeSPs].[ImportSales]
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS BEGIN ATOMIC WITH
(
 TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english'
)
--varables needed to make the loop work
declare @i int = 1, @max int
select @max = count(id) from dsa.erp_sales –the number of loops to be maked

declare @ProductKey nvarchar(250)
      ,@ResellerKey nvarchar(250)
      ,@EmployeeKey nvarchar(250)
      ,@CurrencyKey nvarchar(250)
      ,@SalesTerritoryKey nvarchar(250)
      ,@SalesOrderNumber nvarchar(250)
      ,@SalesOrderLineNumber tinyint
      ,@OrderQuantity smallint
      ,@UnitPrice money
      ,@CarrierTrackingNumber nvarchar(250)
      ,@OrderDate date
      ,@DueDate date
      ,@ShipDate date

--starting the loop
while @i <= @max
begin

--populating values for the row being treated
select
       @ProductKey = ProductKey
      ,@ResellerKey =  ResellerKey
      ,@EmployeeKey = EmployeeKey
      ,@CurrencyKey = CurrencyKey
      ,@SalesTerritoryKey = SalesTerritoryKey
      ,@SalesOrderNumber = SalesOrderNumber
      ,@SalesOrderLineNumber = SalesOrderLineNumber
      ,@OrderQuantity = OrderQuantity
      ,@UnitPrice = UnitPrice
      ,@CarrierTrackingNumber = CarrierTrackingNumber
      ,@OrderDate = OrderDate
      ,@DueDate = DueDate
      ,@ShipDate = ShipDate
FROM dsa.erp_sales
where id = @i

--declaring variable to be used to check if row exits
declare @LineExists bit = 0, @EDWID uniqueidentifier

--check if row exists in the destination tables based on business keys
SELECT TOP 1 @LineExists = 1, @EDWID = Id from edw.Sales
       WHERE SalesOrderNumber = @SalesOrderNumber
       AND SalesOrderLineNumber = @SalesOrderLineNumber

--if row does not exists, it is added
IF @LineExists = 0
BEGIN
insert into edw.Sales (StatusType, ImportId, OrderQuantity, UnitPrice, DueDate, OrderDate, ShipDate, CarrierTrackingNumber, ProductId, SalesTerritoryId, CurrencyId, ProductOriginalKey, SalesTerritoryOriginalKey, CurrencyOriginalKey, SalesOrderLineNumber, SalesOrderNumber, EmployeeId, EmployeeOriginalKey, ResellerId, ResellerOriginalKey )
values (3,0, @OrderQuantity, @UnitPrice,  @DueDate, @OrderDate, @ShipDate, @CarrierTrackingNumber,'18785F54-D73F-44D0-B88D-036DACFD6888', '18785F54-D73F-44D0-B88D-036DACFD6888' , '18785F54-D73F-44D0-B88D-036DACFD6888' , @ProductKey, @SalesTerritoryKey, @CurrencyKey,  @SalesOrderLineNumber, @SalesOrderNumber, '18785F54-D73F-44D0-B88D-036DACFD6888' , @EmployeeKey, '18785F54-D73F-44D0-B88D-036DACFD6888', @ResellerKey)
END

--if row exists, it is updated
IF @LineExists = 1
BEGIN
UPDATE edw.Sales
SET
    OrderQuantity = @OrderQuantity
  , UnitPrice = @UnitPrice
  , DueDate =  @DueDate
  , OrderDate = @OrderDate
  , ShipDate = @ShipDate
  , CarrierTrackingNumber = @CarrierTrackingNumber
  , ProductOriginalKey = @ProductKey
  , SalesTerritoryOriginalKey = @SalesTerritoryKey
  , CurrencyOriginalKey = @CurrencyKey
  , EmployeeOriginalKey = @EmployeeKey
  , ResellerOriginalKey = @ResellerKey
WHERE Id = @EDWID
END

--increment the value of @i and restart the loop
set @i = @i+1
END
END

The statement can, as mentioned,  be elaborated further to allow different cases of updating, the relevant part of such an example below. This cased is used to make a type2 slowly changing dimension

--check if row exists in the destination tables based on business keys
SELECT TOP 1 @LineExists = 1, @EDWID = Id, @EDWParentKey = ParentBusinessKey from edw.Employee
      WHERE BusinessKey = @BusinessKey

--If the line does not exits OR line exits but with a different parentkey the line is added
IF (@LineExists = 0 OR (@LineExists = 1 AND IsNull(@EDWParentKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2)))
BEGIN
insert into edw.Employee ([ParentBusinessKey], [DisplayName], [ImportId], [IsCurrent], [StartDate], [BusinessKey])
values (@ParentBusinessKey, @DisplayName, 0,1, cast(getdate() as date),@BusinessKey )
END

--If the line exits AND parentkey is equal to the incoming parent key the name is updated
IF (@LineExists = 1 AND IsNull(@EDWParentKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2))
BEGIN
UPDATE edw.Employee
SET
   [DisplayName] = @DisplayName
WHERE Id = @EDWID
END

--If the line exits but with a different parentkey the old line id marked with an expiration date
IF (@LineExists = 1 AND IsNull(@EDWParentKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2))
BEGIN
UPDATE edw.Employee
SET
   [EndDate] = cast(getdate() as date)
WHERE Id = @EDWID
END

As the procedure is compiled to machine code this is lightning fast, a million rows takes 12 sec to run through the procedure on a small machine whereas executing the loop as SQL on the in-memory tables takes 3+ minutes, so a great deal of the performance gain come from compiling the code to native stored procedures.