Part 2 – OpenXML: What is it and how do I use it?

Introduction

This is the second post in a series of five on Using TVP’s and XML to Transport Relational Data

In this post we’ll look at how to use the OpenXML Rowset Provider to shred an XML document into a rowset which can then be used in Transact-SQL statements in which rowset providers such as a table, view, or the OPENROWSET function can appear. We’ll also look at how .Net Developers can use OpenXML to reduce the number of SQL Server calls required to handle data modifications.

For details of other posts in this series, please refer to Using TVP’s and XML to Transport Relational Data.

Brief History

The OpenXML Rowest Provider was first introduced in SQL Server 2000 where it could be used to shred an XML string into columns of a rowset by evaluating XPath 1.0 expressions. In SQL Server 2005 Microsoft enhanced OpenXML so that it could accept the newly introduced XML Data Type in sp_xml_preparedocument, and to generate XML and new SQL type columns in a rowset.

Syntax

The basic OpenXML syntax is:

OPENXML( idoc int [ in] , rowpattern nvarchar [ in ] , [ flags byte [ in ] ] )
[ WITH ( SchemaDeclaration | TableName ) ]
(Please refer to http://msdn.microsoft.com/en-us/library/ms186918.aspx for full syntax detail and argument options.)
Now that all the boring stuff is out of the way, let’s look at a very simple example of shredding an XML document into a rowest and then using that rowset to update records in our database.
 

Generating XML with .Net

For the purpose of this example, a Passenger is the primary person on a cruise booking and PAX are additional persons traveling with this passenger. Now, I know you’re probably already thinking “Why has he got FirstName and LastName in both tables?”. I’ve done this to make the example easier to follow.

Figure 1 represents an ADO.NET Dataset called dsPassengers which contains two tables (Passenger and PAX) and it has been created based on two tables in our SQL Server database called Passenger and PAX. As this is an internal use dataset only, I have blanked out the Namespace property for this dataset (I will elaborate on my reasons for doing this later).

The PersonID, PAXID, FirstName and LastName columns are all defined as System.String, the Newsletter column has been defined as System.Boolean, and the PAXOrder column has been defined as a System.Byte.

image

Figure 1

Let’s assume that I have a .Net application running in the background which has populated this dataset with the following records:

Passenger

PersonID: 7a7c18a0-ad47-e011-9c71-00155d991421

FirstName: Jeff

LastName: Wharton

Newsletter: False

PAX

PAXID: 868b8fde-068a-4468-b9b6-9660166e630d

FirstName: Lily

LastName: Wharton

PAXOrder: 3

PAXID: bb2770e0-b0e0-479f-8c03-1330ce87cb64

FirstName: Ben

LastName: Wharton

PAXOrder: 2

This gives me a total of 3 records (rows) that I’d like to send to SQL Server.

Given this situation, most people would probably set up a loop to traverse the tables in the dataset, set up another loop to traverse the records in each table, and send each row to the database one at a time resulting in 3 calls being made. This is where the power of XML comes into play.

If you call the GetXML() method of an ADO.NET Dataset, it returns an XML representation of the data held within it.  This is one of the benefits of using ADO.NET Datasets; it looks after all the messy work involved in creating an XML representation of your data.  So, for our dataset, the XML would look like:

<dsPassengers>
  <Passenger>
    <Newsletter>false</Newsletter>
    <PersonID>7a7c18a0-ad47-e011-9c71-00155d991421</PersonID>
    <FirstName>Jeff</FirstName>
    <LastName>Wharton</LastName>
  </Passenger>
  <PAX>
    <PAXID>868b8fde-068a-4468-b9b6-9660166e630d</PAXID>
    <FirstName>Lily</FirstName>
    <LastName>Wharton</LastName>
    <PaxOrder>3</PaxOrder>
  </PAX>
  <PAX>
    <PAXID>bb2770e0-b0e0-479f-8c03-1330ce87cb64</PAXID>
    <FirstName>Ben</FirstName>
    <LastName>Wharton</LastName>
    <PaxOrder>2</PaxOrder>
  </PAX>
</dsPassengers>

You can see from the returned XML that the Passenger and PAX tables are nested within dsPassenger and that there is 1 Passenger record (row) and two PAX records (rows).

Now that I’ve got this XML, I can start work on creating a stored procedure to process the XML (We’ll come back to the .Net side of things a bit later on).

Using OpenXML

The first step is to create two variables in SQL Server Query Analyser; one to hold the XML returned above and another to be used with OpenXML:

DECLARE @idoc int
DECLARE @doc XML = '
<dsPassengers>
  <Passenger>
    <Newsletter>false</Newsletter>
    <PersonID>7a7c18a0-ad47-e011-9c71-00155d991421</PersonID>
    <FirstName>Jeff</FirstName>
    <LastName>Wharton</LastName>
  </Passenger>
  <PAX>
    <PAXID>868b8fde-068a-4468-b9b6-9660166e630d</PAXID>
    <FirstName>Lily</FirstName>
    <LastName>Wharton</LastName>
    <PaxOrder>3</PaxOrder>
  </PAX>
  <PAX>
    <PAXID>bb2770e0-b0e0-479f-8c03-1330ce87cb64</PAXID>
    <FirstName>Ben</FirstName>
    <LastName>Wharton</LastName>
    <PaxOrder>2</PaxOrder>
  </PAX>
</dsPassengers>'

Now I need to write the code that reads the XML text provided as input, parses the text by using the MSXML parser (Msxmlsql.dll), and provides the parsed document in a state ready for consumption.

EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

sp_xml_preparedocument returns a handle (assigned to the @idoc variable I created earlier) that can be used to access the newly created internal representation of the XML document.

Now I need to write the code that takes this handle and passes it to the OpenXML Rowset Provider:

SELECT    PersonID, FirstName, LastName, Newsletter
FROM       OPENXML (@idoc, '/dsPassengers/Passenger',2)
            WITH (PersonID  varchar(50),
                  FirstName varchar(20),
                  LastName varchar(20),
                  Newsletter BIT)

SELECT    PersonID, PAXID, FirstName, LastName
FROM       OPENXML (@idoc, '/dsPassengers/PAX',2)
            WITH (PAXID  varchar(50),
                  FirstName varchar(20),
                  LastName varchar(20),
                  PaxOrder int,
                  -- If we use the optional ColPattern option, we can 
                  -- access data anywhere within the XML
                  PersonID varchar(50) '../Passenger/PersonID')
ORDER BY PaxOrder

I then need to write the code that removes the internal representation of the XML document specified by the document handle and invalidates the document handle.

EXEC sp_xml_removedocument @idoc

It is very important that you execute sp_xml_removedocument as soon as you can as a parsed document is stored in the internal cache of SQL Server and the MSXML parser (Msxmlsql.dll) uses one-eighth of the total memory available for SQL Server so we don’t want these little buggers hanging around for too long!

That’s it. If I execute the above code, it will return the following:

image

Because OpenXML has turned my XML data into a rowset, I can use the results in Transact-SQL statements in which rowset providers such as a table, view, or the OPENROWSET function can appear. For example:

SELECT    CruiseCode
FROM       OPENXML (@idoc, '/dsPassengers/Passenger',2)
            WITH (PersonID  varchar(50),
                  FirstName varchar(20),
                  LastName varchar(20),
                  Newsletter BIT) XMLData
INNER JOIN Booking.Actual BA ON XMLData.PersonID = BA.PersonID

UPDATE Person.Detail
SET FirstName = XMLData.FirstName, LastName = XMLData.LastName,
Newsletter = XMLData.Newsletter
FROM       OPENXML (@idoc, '/dsPassengers/Passenger',2)
            WITH (PersonID  varchar(50),
                  FirstName varchar(20),
                  LastName varchar(20),
                  Newsletter BIT) XMLData
WHERE Person.Detail.PersonID = XMLData.PersonID

image

Namespaces

Earlier on I stated that I was blanking out the Namespace property of dsPassenger. Here’s the reason why.

If I add a Namespace to our dataset (I’ve used the default assigned by ADO.NET), the XML returned from the GetXML() method looks this:

<dsPassengers xmlns="http://tempuri.org/dsPassengers.xsd">
  <Passenger>
    <Newsletter>false</Newsletter>
    <PersonID>7a7c18a0-ad47-e011-9c71-00155d991421</PersonID>
    <FirstName>Jeff</FirstName>
    <LastName>Wharton</LastName>
  </Passenger>
</dsPassengers>

I now need to account for this Namespace when I code our sp_xml_preparedocument statement

EXEC sp_xml_preparedocument @hdoc OUTPUT, @PassengerDetail,
  '<dsPassengers xmlns:a="http://tempuri.org/dsPassengers.xsd"/>'

and also in our OpenXML statement

SELECT [a:PersonID] PersonID, [a:FirstName] FirstName, 
[a:LastName] LastName, [a:Newsletter] Newsletter
FROM OPENXML(@hDoc, '//a:Passenger', 2)
WITH ([a:PersonID]  varchar(50),
                  [a:FirstName] varchar(20),
                  [a:LastName] varchar(20),
                  [a:Newsletter] BIT)

Now, for the XML purists out there, this is great.  For someone whose not sharing their XML outside of their organisation and just wants to shred-and-store, all this extra work is a pain in the backside so I never use a Namespace.

Stored Procedure

So now that I know my code is working and I’ve removed all the ‘”Namespace” changes, I’ll turn it into a Stored Procedure.

CREATE PROC [Person].[usp_UpdatePassengerDetail]
(
	@PassengerDetail XML
)
AS

SET NOCOUNT ON 

DECLARE @idoc int

EXEC sp_xml_preparedocument @idoc OUTPUT, @PassengerDetail

UPDATE Person.Detail
SET FirstName = XMLData.FirstName, LastName = XMLData.LastName,
Newsletter = XMLData.Newsletter
FROM       OPENXML (@idoc, '/dsPassengers/Passenger',2)
            WITH (PersonID  varchar(50),
                  FirstName varchar(20),
                  LastName varchar(20),
                  Newsletter BIT) XMLData
WHERE Person.Detail.PersonID = XMLData.PersonID 

UPDATE Person.PAX
SET PersonID = XMLData.PersonID, FirstName = XMLData.FirstName,
LastName = XMLData.LastName
FROM       OPENXML (@idoc, '/dsPassengers/PAX',2)
            WITH (PAXID  varchar(50),
                  FirstName varchar(20),
                  LastName varchar(20),
                  PaxOrder int,
                  PersonID varchar(50) '../Passenger/PersonID') XMLData
WHERE Person.PAX.PAXID = XMLData.PAXID 

EXEC sp_xml_removedocument @idoc

IF @@ERROR <> 0 
    -- Return 99 to the calling program to indicate failure.
    RETURN 99
ELSE
    -- Return 0 to the calling program to indicate success.
    RETURN 0

SET NOCOUNT OFF

I now have a stored procedure which accepts 1 parameter of type XML so let’s take a quick look at how to call it from a .Net application.

Putting It All Together

To call the stored procedure from .Net and pass my our XML data is very straight forward.

I’m using the Microsoft Enterprise Library 5.0 as it handles the majority of the database code required to connect to SQL Server and execute the stored procedure.

using Microsoft.Practices.EnterpriseLibrary.Data;
using Microsoft.Practices.EnterpriseLibrary.Data.Sql;
// Custom Dataset Schema Class
//****************************
using Ozcruising.Schemas;
//****************************
using System.Data.Common;
using System;
using System.Data;

class OpenXML
{
    public int UpdatePassengerPaxDetails(dsPassengers ds)
    {
        SqlDatabase db = (SqlDatabase)DatabaseFactory.CreateDatabase();

        // This is the stored procedure we created earlier
        DbCommand dbCommand = 
            db.GetStoredProcCommand("[Person].[usp_UpdatePassengerDetail]");

        // Add a paramter of type XML and assign the 
        // XML from the dsPassengers dataset
        db.AddInParameter(dbCommand, "@PassengerDetail", DbType.Xml, ds.GetXml());
        // Add a ReturnValue paramter of type INT for error checking
        db.AddParameter(dbCommand, "@RETURN_VALUE", DbType.Int16, 0, 
            ParameterDirection.ReturnValue, false, 0, 0, "", 
            DataRowVersion.Current, 0);
            
        // Execute the stored procedure
        db.ExecuteNonQuery(dbCommand);
            
        //Get error indicator
        int erroCode = 
            Convert.ToInt32(db.GetParameterValue(dbCommand, "@RETURN_VALUE"));

        return erroCode;
    }
}

How simple is that!   I’ve replaced 3 calls to the database with one.

Conclusion

As you can see from the above example, using XML to send multiple rows of data to SQL Server is very simple to implement.  That said, there are a number of tips and tricks that can be used to limit the impact shredding XML data has on SQL Server and I’ll discuss these in Part 5 – OpenXML and XQuery Optimisation Tips.

Advertisements

About Mr. Wharty's Ramblings
Jeff holds a Master’s Degree in Systems Development (.NET Stream) and a Master’s Degree in Database Design and Management (SQL Server). Jeff is also a MCT specialising in SQL Server technologies and holds MCPD, MCITP, MCDBA and MCSD certifications.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: