Database Programming: Why I Like MERGE

download CU1 for SQL Server 2008 SP1In the wake of last month’s post on my new development efforts in SQL Server 2008, several of you have asked me to go into further detail. This post is my first effort to answer those questions; right now, we’ll focus on these two, which relate to this statement:

By using a CTE with a MERGE and a TRY-CATCH block, in a single statement I can:

  • shred the XML document containing user input;
  • update existing records;
  • modify existing records;
  • delete records (either logically or physically);
  • create a driver table for child processes containing PKs of inserted records (isolating the PKs of just-inserted records was a huge challenge until the OUTPUT clause was introduced in SQL Server 2005).

Before I present the code, I can partially address Adam’s performance question: performance does appear to suffer some under load. My performance tuning is incomplete, so it’s hard for me to say at the moment how much of this about MERGE and how much of this about the state of my progress in the tuning analysis. Once the tuning is complete, our test team will be running an analysis to determine exactly where our code stops scaling.

We’re using the same code for user interactions that we are for our bulk load, which is pretty bold on our part, but also testimonial to the fundamental flexibility and power of the syntax. Our user interactions are running great; our data loader is currently running slower than we’d like but I’ve proposed an architectural change which should resolve that issue. I’ll of course report back.

Now, for Andrew Bauer and any other interested parties, here’s the code. The procedure this code is adapted from is one in a hierarchical series. Each procedure error-checks the data pertinent to its level in the hierarchy before processing the MERGE statement; the TRY-CATCH block in each procedure is designed to cumulative populate an XML document which is passed among the levels as an OUTPUT parameter.

I’ll present the CATCH block first, because it’s shorter:

BEGIN CATCH
      
    IF 0 = @CalledInternally
    BEGIN
      ROLLBACK TRANSACTION Data_Maintenance
    END
      
    -- Add error to ErrorXml
    SET @ParamResult = 0
    SELECT
            @ErrorNumber = ERROR_NUMBER()
            ,@ErrorMessage = ERROR_MESSAGE()
            ,@ErrorLine = ERROR_LINE()
            ,@ErrorModule= ERROR_PROCEDURE()
           
    SET @ParamErrorXml.modify
    (  
        'insert
            <Error
                Code = "{sql:variable("@ErrorNumber")}"
                Description = "{sql:variable("@ErrorMessage") }"
                LineNumber = "{sql:variable("@ErrorLine") }"
                ErrorModule = "{sql:variable("@ErrorModule") }"
            />
        as last into (/Errors)[1]'  
    )
     
END CATCH

Rather than the SET statement shown above, the production code calls a stored procedure which does a little more business to ensure that the XML is populated and properly formatted, but this SET statement, which is contained in the procedure, gets to the heart of the XML population.

The heart of this post is the MERGE statement, and it’s taken me long enough to get to it. I’ve tweaked it a bit to obfuscate the source, but the benefit to that is that I believe it to be self-documenting. Please leave a comment if you disagree.

Here’s the code..

-- Merge @IncomingXML into the BaseTable table

;WITH [IncomingXML] (
     [PrimaryKey_Provided]
    ,[PrimaryKey_Derived]
    ,[ParentKeysInXML]
    ,[Version]
    ,[NextVersion]
    ,[IncomingGuid]
    ,[OwnerID]
    ,[CheckedOut]
    ,[CheckedOutByID]
    ,[LanguageLocaleID]
    ,[ChildXML]
    ,[StatusID]
) AS (
    SELECT
         [xml].[PrimaryKey] AS PrimaryKey_Supplied
        ,[s].[PrimaryKey] AS [PrimaryKey_Derived]
        ,[xml].[ParentKeysInXML]
        ,[xml].[Version]
   ,(
            SELECT ISNULL(MAX([Version]),0)+1 AS [NextVersion]
            FROM [dbo].[BaseTable]
            WHERE [Guid] = [xml].[IncomingGuid]
        )
        ,[xml].[IncomingGuid]
        ,[xml].[CheckedOut]
        ,[xml].[CheckedOutByID]
        ,[xml].[ExampleTypeId]
        ,[xml].[LanguageLocaleID]
        ,[xml].[ChildXML]
        ,[xml].[StatusID]
    FROM (
        SELECT
             ref.value('@ID','int') AS [PrimaryKey]
            ,ref.query('Relationship') AS [ParentKeysInXML]
            ,ref.value('@Version','nvarchar(10)') AS [Version]
            ,ref.value('@Guid','uniqueidentifier') AS [IncomingGuid]
            ,ref.value('@OwnerID','int') AS [OwnerID]
            ,ref.value('@CheckedOut','bit') AS [CheckedOut]
            ,ref.value('@CheckOutByID','int') AS [CheckedOutByID]
            ,ref.value('@TypeID','int') AS [ExampleTypeId]
            ,ref.value('@LanguageLocaleID','int') AS [LanguageLocaleID]
            ,ref.query('ChildXML') AS [ChildXML]
            ,ref.value('@StatusID','int') AS [StatusID]
        FROM @IncomingXML.nodes('/root[1]/node') as node(ref)
    ) AS [xml]
    LEFT OUTER JOIN [dbo].[BaseTable] [s]
      ON [xml].[PrimaryKey] = [s].[PrimaryKey]
    )
MERGE
    INTO [dbo].[BaseTable] [table]
    USING [IncomingXML] [cte]
    ON [table].[PrimaryKey] = ISNULL([cte].[PrimaryKey_Derived], [cte].[PrimaryKey_Provided])

 

-- delete a record
-- note that we could also employ an UPDATE statement for a logical delete here; DELETE used for brevity
-- logical delete might also imply changing the following WHEN statement to something like
-- WHEN MATCHED AND [table].[Guid] = [cte].[IncomingGuid] AND [cte].[StatusID] =
-- (SELECT [StatusID] FROM [dbo].[Status] WHERE [StatusDescription] = 'delete')

    WHEN NOT MATCHED BY SOURCE
        THEN DELETE

 -- update existing record:
    WHEN MATCHED AND ([table].[Guid] = [cte].[IncomingGuid] or [cte].[IncomingGuid] IS NULL)
        THEN UPDATE SET
             [Version] = ISNULL([cte].[Version],[table].[Version])
            ,[CheckedOut] = ISNULL([cte].[CheckedOut],[table].[CheckedOut])
            ,[CheckedOutByID] = CASE
                                        WHEN ISNULL([cte].[CheckedOut],[table].[CheckedOut]) = 1
                                          THEN @UserNameID
                                        ELSE NULL
                                    END
          ,[Guid] = ISNULL([cte].[IncomingGuid],[table].[Guid])
            ,[ExampleTypeId] = ISNULL([cte].[ExampleTypeId],[table].[ExampleTypeId])
            ,[LanguageLocaleID] = ISNULL([cte].[LanguageLocaleID],[table].[LanguageLocaleID])
            ,[StatusID] = ISNULL([cte].[StatusID],[table].[StatusID])
            ,[LastModifiedDate] = @Now
            ,[LastModifiedByID] = @UserNameID

 -- insert new record:
    WHEN NOT MATCHED BY TARGET AND [cte].[IncomingGuid] IS NULL
        THEN INSERT (
             [Version]
            ,[CheckedOut]
            ,[CheckedOutByID]
            ,[Guid]
            ,[ExampleTypeId]
            ,[LanguageLocaleID]
            ,[LastModifiedDate]
            ,[LastModifiedByID]
            ,[CreatedDate]
            ,[CreatedByID]
            ,[StatusID]
        ) VALUES (
             ISNULL([cte].[Version],[cte].[NextVersion])
            ,ISNULL([cte].[CheckedOut],0)
            ,CASE
      WHEN [cte].[CheckedOut] = 1 THEN [cte].[CheckedOutByID]
      ELSE NULL
            END
            ,NEWID()
            ,[cte].[ExampleTypeId]
            ,ISNULL([cte].[LanguageLocaleID],@DefaultLanguageLocale)
            ,@Now
            ,@UserNameID
            ,@Now
   ,@UserNameID
            ,ISNULL([cte].[StatusID],@DefaultStatus)
        )

-- populate a previously built table for the "child" procedure to use:
      OUTPUT
             ISNULL([inserted].[PrimaryKey],[deleted].[PrimaryKey])
            ,[cte].[ParentKeysInXML]
            ,ISNULL([inserted].[CheckedOut],[deleted].[CheckedOut])
            ,ISNULL([inserted].[CheckedOutByID],[deleted].[CheckedOutByID])
            ,ISNULL([inserted].[LanguageLocaleID],[deleted].[LanguageLocaleID])
            ,ISNULL([inserted].[StatusID],[deleted].[StatusID])
            ,[cte].[ChildXML]
            ,CASE
      WHEN [deleted].[PrimaryKey] IS NULL AND [inserted].[PrimaryKey] IS NOT NULL THEN 'I'
      WHEN [deleted].[PrimaryKey] IS NOT NULL AND [inserted].[PrimaryKey] IS NULL THEN 'D'
      ELSE 'U'
             END
    INTO #MapTableForChildSproc (
            [PrimaryKey]
            ,[ParentKeysInXML]
            ,[CheckedOut]
            ,[CheckedOutByID]
            ,[LanguageLocaleID]
            ,[StatusID]
            ,[ChildXML]
            ,[ActionInDatabase]
        );

Thanks for making it this far.. please let me know if you have any questions!

-wp


this copyrighted material was originally posted at https://blogs.technet.com/wardpond

the author and his employer are pleased to provide this content for you at that site, and via rss, free of charge and without advertising.

the author welcomes and appreciates links to and citations of his work. however, if you are viewing the full text of this article at any other website, be aware that its author does not endorse and is not compensated by any advertising or access fees you may be subjected to outside the original web and rss sites.

Comments

  • Anonymous
    August 24, 2009
    Awesome article. Thanks for the XML merge code. On another not; other than conversion of data types, have you done any XML data validation before? I was reading in the SQL documentation about typed XML data types and was curious if this could be used to achieve validation, but was not sure how to implement them.