Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have a merge statement that takes around 10 minutes to process 5 million or more records.

The merge statement is part of a stored procedure that takes my newly bulk loaded staging table then runs data integrity checks, transforms the data into the proper format, and inserts that data where it should go.

Here are the tables:

CREATE TABLE [dbo].[OrderFact]
(
     Id BIGINT IDENTITY PRIMARY KEY
    ,CustomerId INT NOT NULL REFERENCES Customer (Id)
    ,BillingProcessorId INT NOT NULL REFERENCES BillingProcessor (Id)
    ,Quantity INT NOT NULL
    ,OrderStatus TINYINT NULL
    ,Cost MONEY NOT NULL
    ,AdjustedCost MONEY NOT NULL
    ,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
    ,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
    ,IsDelete BIT NOT NULL DEFAULT (0)
);
GO

CREATE TABLE [Staging].[VendorAOrder]
(
     OrderId INT
    ,Quantity INT NULL
    ,Cost MONEY NULL
    ,OrderStatus TINYINT NULL
    ,SellDate DATETIME2 NULL
    ,CustomerId NVARCHAR(20) NULL
    ,VendorPrefix NVARCHAR(20) NULL
    ,DataSetId INT NULL
)
GO

CREATE TABLE [Maps].[VendorAOrder]
(
     Id BIGINT IDENTITY PRIAMRY KEY
    ,OrderFactId BIGINT NOT NULL REFERENCES OrderFact (Id)
    ,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
    ,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
    ,IsDelete BIT NOT NULL DEFAULT (0)
);
GO

CREATE TABLE [Maps].[VendorARelatedOrder]
(
     Id INT IDENTITY PRIMARY KEY
    ,VendorPrefix NVARCHAR(20) NOT NULL
    ,DataSetId INT NOT NULL REFERENCES DataSet (Id)
);
GO

CREATE TABLE [Maps].[VendorACustomer]
(
     Id INT IDENTITY PRIMARY KEY
    ,CustomerId INT NOT NULL REFERENCES Customer (Id)
    ,OtherCustomerId NVARCHAR(20) NOT NULL
    ,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
    ,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
    ,IsDelete BIT NOT NULL DEFAULT (0)
);
GO

Here is the merge statement:

--In stored procedure

CREATE TABLE #OrderMaps
(
     Id INT IDENTITY PRIMARY KEY
    ,OrderFactId BIGINT NOT NULL
    ,OrderId INT NOT NULL
    ,RelatedOrderId INT NOT NULL
)

-- Merge statement
-- Gets all non duplicate orders and inserts them into OrderFact
-- Outputs OrderId and new OrderFactId into temp table
-- Which is later inserted into [Maps].[VendorAOrder]
MERGE [OrderFact] AS [Target]
USING
(
    SELECT
        ,mc.CustomerId
        ,b.Id AS BillingProcessorId
        ,o.OrderId
        ,ro.Id AS RelatedOrderId
        ,o.Quantity
        ,o.OrderStatus
        ,o.Cost
        ,AdjustedCost = CASE WHEN OrderStatus=1 THEN -Cost ELSE Cost END
        ,o.SellDate
    FROM
    (
        SELECT *
        FROM [Staging].[VendorAOrder] o
        WHERE NOT EXISTS
        (
            SELECT 1
            FROM [Maps].[VendorAOrder] orders
            JOIN OrderFact of
            ON order.OrderFactId = of.Id
            AND orders.OrderId = o.OrderId
            AND of.DataSetId = o.DataSetId
            AND of.IsDelete = 0
        )
    ) o
    JOIN Maps.VendorARelatedOrder ro
    ON or.VendorPrefix = o.VendorPrefix
    AND or.DataSetId = o.DataSetId
    JOIN BillingProcessor b
    ON b.Id = o.BillingProcessorId
    JOIN [Maps].VendorACustomer mc
    ON mc.OtherCustomerId = o.CustomerId
    AND mc.VendorId = o.VendorId
) AS [Source]
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
    INSERT
        (
             CustomerId
            ,BillingProcessorId
            ,Quantity
            ,OrderStatus
            ,Cost
            ,AdjustedCost
            ,SellDate
        )
    VALUES
        (
             [Source].CustomerId
            ,[Source].BillingProcessorId
            ,[Source].Quantity
            ,[Source].OrderStatus
            ,[Source].Cost
            ,[Source].AdjustedCost
            ,[Source].SellDate
        )
    OUTPUT (Inserted.Id, OrderId, RelatedOrderId)
    INTO #OrderMaps(OrderFactId, OrderId, RelatedOrderId)
; -- End merge

Some things that I have tried that partially improved performance:

Create a temp table to pre filter orders

CREATE TABLE #DistinctOrders
(
     Id INT IDENTITY PRIMARY KEY
    ,OrderId NOT INT
    ,Quantity INT NOT NULL
    ,Cost MONEY NOT NULL
    ,OrderStatus TINYINT NOT NULL
    ,SellDate DATETIME2 NOT NULL
    ,CustomerId NVARCHAR(20) NOT NULL
    ,VendorPrefix NVARCHAR(20) NOT NULL
    ,DataSetId INT NOT NULL
    ,RelatedOrderId INT NOT NULL
);
GO

I could insert the orders into this table before I used the Merge Statement, but I am not sure of its performance increase.

share|improve this question
Why is your merge condition ON 1 = 0? It would help if you post index DDL as well as the execution plan for this query. And why do you say that you "tried" using a temp table but you're "not sure of its performance increase"? If you tried it, then presumably you know what the result was? – Pondlife Apr 17 at 21:34

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.