sql 2014

Mimicking a merge statement in Native compiled stored procedures in SQL 2014 in-memory tables

I needed to convert a stored procedure to native compiled in sql2014 in-memory tables, but since neither the merge nor the left join operators are supported in native stored procedure a twist of creativity was needed.

Inspired by http://msdn.microsoft.com/en-us/library/dn579375%28v=sql.120%29.aspx I decided on a row-by-row while loop that checks if the row exists in the destination table and in that case updates the row, if it doesn’t the row is added. Further cases can be added to the statements and more cases of updates etc appended to mimic even the most advanced merge statement.  Due to the machine code compiling, this approach works at least as fast as a regular merge statement.

The code to merge two copies of a blown-up version of  AdventureWorksDW2012 below, first example merges FactInternetSales, the second makes a type2 slowly changing dimension behavior merge of DimEmployees.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [NativeSPs].[ImportSales]
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS BEGIN ATOMIC WITH
(
 TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english'
)
--varables needed to make the loop work
declare @i int = 1, @max int
select @max = count(id) from dsa.erp_sales –the number of loops to be maked

declare @ProductKey nvarchar(250)
      ,@ResellerKey nvarchar(250)
      ,@EmployeeKey nvarchar(250)
      ,@CurrencyKey nvarchar(250)
      ,@SalesTerritoryKey nvarchar(250)
      ,@SalesOrderNumber nvarchar(250)
      ,@SalesOrderLineNumber tinyint
      ,@OrderQuantity smallint
      ,@UnitPrice money
      ,@CarrierTrackingNumber nvarchar(250)
      ,@OrderDate date
      ,@DueDate date
      ,@ShipDate date

--starting the loop
while @i <= @max
begin

--populating values for the row being treated
select
       @ProductKey = ProductKey
      ,@ResellerKey =  ResellerKey
      ,@EmployeeKey = EmployeeKey
      ,@CurrencyKey = CurrencyKey
      ,@SalesTerritoryKey = SalesTerritoryKey
      ,@SalesOrderNumber = SalesOrderNumber
      ,@SalesOrderLineNumber = SalesOrderLineNumber
      ,@OrderQuantity = OrderQuantity
      ,@UnitPrice = UnitPrice
      ,@CarrierTrackingNumber = CarrierTrackingNumber
      ,@OrderDate = OrderDate
      ,@DueDate = DueDate
      ,@ShipDate = ShipDate
FROM dsa.erp_sales
where id = @i

--declaring variable to be used to check if row exits
declare @LineExists bit = 0, @EDWID uniqueidentifier

--check if row exists in the destination tables based on business keys
SELECT TOP 1 @LineExists = 1, @EDWID = Id from edw.Sales
       WHERE SalesOrderNumber = @SalesOrderNumber
       AND SalesOrderLineNumber = @SalesOrderLineNumber

--if row does not exists, it is added
IF @LineExists = 0
BEGIN
insert into edw.Sales (StatusType, ImportId, OrderQuantity, UnitPrice, DueDate, OrderDate, ShipDate, CarrierTrackingNumber, ProductId, SalesTerritoryId, CurrencyId, ProductOriginalKey, SalesTerritoryOriginalKey, CurrencyOriginalKey, SalesOrderLineNumber, SalesOrderNumber, EmployeeId, EmployeeOriginalKey, ResellerId, ResellerOriginalKey )
values (3,0, @OrderQuantity, @UnitPrice,  @DueDate, @OrderDate, @ShipDate, @CarrierTrackingNumber,'18785F54-D73F-44D0-B88D-036DACFD6888', '18785F54-D73F-44D0-B88D-036DACFD6888' , '18785F54-D73F-44D0-B88D-036DACFD6888' , @ProductKey, @SalesTerritoryKey, @CurrencyKey,  @SalesOrderLineNumber, @SalesOrderNumber, '18785F54-D73F-44D0-B88D-036DACFD6888' , @EmployeeKey, '18785F54-D73F-44D0-B88D-036DACFD6888', @ResellerKey)
END

--if row exists, it is updated
IF @LineExists = 1
BEGIN
UPDATE edw.Sales
SET
    OrderQuantity = @OrderQuantity
  , UnitPrice = @UnitPrice
  , DueDate =  @DueDate
  , OrderDate = @OrderDate
  , ShipDate = @ShipDate
  , CarrierTrackingNumber = @CarrierTrackingNumber
  , ProductOriginalKey = @ProductKey
  , SalesTerritoryOriginalKey = @SalesTerritoryKey
  , CurrencyOriginalKey = @CurrencyKey
  , EmployeeOriginalKey = @EmployeeKey
  , ResellerOriginalKey = @ResellerKey
WHERE Id = @EDWID
END

--increment the value of @i and restart the loop
set @i = @i+1
END
END

The statement can, as mentioned,  be elaborated further to allow different cases of updating, the relevant part of such an example below. This cased is used to make a type2 slowly changing dimension

--check if row exists in the destination tables based on business keys
SELECT TOP 1 @LineExists = 1, @EDWID = Id, @EDWParentKey = ParentBusinessKey from edw.Employee
      WHERE BusinessKey = @BusinessKey

--If the line does not exits OR line exits but with a different parentkey the line is added
IF (@LineExists = 0 OR (@LineExists = 1 AND IsNull(@EDWParentKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2)))
BEGIN
insert into edw.Employee ([ParentBusinessKey], [DisplayName], [ImportId], [IsCurrent], [StartDate], [BusinessKey])
values (@ParentBusinessKey, @DisplayName, 0,1, cast(getdate() as date),@BusinessKey )
END

--If the line exits AND parentkey is equal to the incoming parent key the name is updated
IF (@LineExists = 1 AND IsNull(@EDWParentKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2))
BEGIN
UPDATE edw.Employee
SET
   [DisplayName] = @DisplayName
WHERE Id = @EDWID
END

--If the line exits but with a different parentkey the old line id marked with an expiration date
IF (@LineExists = 1 AND IsNull(@EDWParentKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2) <> IsNull(@ParentBusinessKey  COLLATE Latin1_General_100_BIN2,'NA'  COLLATE Latin1_General_100_BIN2))
BEGIN
UPDATE edw.Employee
SET
   [EndDate] = cast(getdate() as date)
WHERE Id = @EDWID
END

As the procedure is compiled to machine code this is lightning fast, a million rows takes 12 sec to run through the procedure on a small machine whereas executing the loop as SQL on the in-memory tables takes 3+ minutes, so a great deal of the performance gain come from compiling the code to native stored procedures.

Advertisements