Part 5 – OpenXML and XQuery Optimisation Tips


Introduction

This is the fifth and final post in a series of five on Using TVP’s and XML to Transport Relational Data.

In this post we’ll take a look at some tips and tricks used to increase OpenXML and XQuery performance.

Refresher

The purpose behind this series of posts is to provide .Net Developers with an insight into three technologies that can be used to send multiple rows of data to SQL Server 2008(R2).

In Part 2 and Part 3 we looked at how to use the OpenXML Rowset Provider and XQuery (respectively) to shred an XML document into a rowset, and in Part 4 we looked at Microsoft’s latest offering; Table-Valued Parameters (TVP’s). We also looked at how easy it is to swap between these technologies when using .Net Framework, ADO.NET Datasets and Microsoft’s Enterprise Library 5.0.

OpenXML

I could only find two real performance tips relating to OpenXML (three if you count don’t use it!)

1) A parsed document is stored in the internal cache of SQL Server and the MSXML parser (Msxmlsql.dll) uses one-eighth the total memory available for SQL Server so to avoid running out of memory, execute sp_xml_removedocument as soon as possible.

2) If you need to use the output of the OpenXML function several times in your procedure, populate a table variable with the shredded output and use the table variable instead of using the OpenXML function over and over again.

XQuery

The following performance tips where sourced from Bob Beauchemin’s SQL Server 2008 Microsoft Certified Master (MCM) XML readiness video.

1) Queries that use parent axis generate extra query plan steps so use multiple CROSS APPLY steps to get nodes at multiple nesting levels rather than using the parent axis.

Bad: select o.value('../@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer/orders') as N(o) Good: select c.value('@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)

2) Move ordinals to the end of path expressions

Bad: 
/book[1]/@isbn

Good: 
(/book/@isbn)[1]

3) Avoid predicates in the middle of path expressions

Bad: 
book[@ISBN = "1-8610-0157-6"]/author[first-name = "Davis"]

Good: 
/book[@ISBN = "1-8610-0157-6"] "n" /book/author[first-name = "Davis"] 

4) Use context item in predicate to lengthen path in exist()

Bad:
SELECT * 
FROM docs 
WHERE 1 = xCol.exist('/book/subject[text() = "security"]') 

Good:
SELECT * 
FROM docs 
WHERE 1 = xCol.exist('/book/subject/text()[. = "security"]') 

5) Casting from XML to SQL

BAD: CAST( CAST(xmldoc.query('/a/b/text()') as nvarchar(500)) as int)
GOOD: xmldoc.value('(/a/b/text())[1]', 'int')

BAD: node.query('.').value('@attr', 'nvarchar(50)')
GOOD: node.value('@attr', 'nvarchar(50)')

6) Use temp table (insert into #temp select … from nodes()) or Table-valued parameter instead of XML

7) Specify a single root node in query as the optimizer assumes that XML can be a fragment.

OpenXML vs. XQuery Performance

There are a ton of forum threads and blog posts out there which have compared the performance of OpenXML and XQuery so I’m not going to post any of my own tests. What I will say though is that OpenXML appears to be faster than XQuery for large documents but slower for small documents. Given this, you’ll probably end up using a mixture of both OpenXML and XQuery (and maybe even a bit of Table-valued Parameters) in your applications.

Part 3 – XQuery: Please don’t use OpenXML!


Introduction

This is the third post in a series of five on Using TVP’s and XML to Transport Relational Data.

In this post we’ll take a look at how to use XQuery within Transact-SQL to shred XML data. We’ll also look at how .Net Developers can use XQuery to reduce the number of SQL Server calls required to handle data modifications.

For details of other posts in this series, please refer to Using TVP’s and XML to Transport Relational Data.

Brief History

XQuery was first introduced in SQL Server 2005 along with the XML Data Type.  Microsoft’s implementation of XQuery was based on the existing XPath query language, with support added for better iteration, better sorting results, and the ability to construct XML.

In SQL Server 2005 there were four XQuery methods (called accessor methods) that could be used with the XML Data Type:

query(): Returns XML data type output

query ('XQuery')

exists(): Returns bit and is used to check based on an expression.

exist (XQuery)

value(): Returns single value from XML, cast to SQL type 

value (XQuery, SQLType)

nodes(): Decomposes XML to a rowset – similar to OpenXML

nodes (XQuery) as Table(Column)

Microsoft added a fifth method (called a Mutator Method) with the release of SQL Server 2008: 

modify(): Performs a modification of XML instance.

modify (insert….) – inserts a single node
modify (replace value of….) – updates a single element/attribute value
modify (delete….) – deletes all nodes that match expression

(Please refer to http://msdn.microsoft.com/en-us/library/ms190798.aspx for full syntax detail and argument options)

In this post I will be using the value() and nodes() methods as all I’m interested in at this stage is demonstrating the basics or shredding XML and storing it in a relational database.

Generating XML with .Net

In Part 2 – OpenXML: What is it and how do I use it? I used an ADO.NET Dataset (dsPassenger) to generate an XML representation of data and then used the OpenXML Rowset Provider to shred this XML into a rowset.  I will be using the exact same XML in this example as it highlights how easy it is to switch between the two technologies, 

Now that I’ve got my XML, I can start work on creating a stored procedure to process the XML (We’ll come back to the .Net side of things a bit later on).

Using XQuery
 
The first step is to create a variable in SQL Server Query Analyser to hold our XML:
 
DECLARE @xml XML = '
<dsPassengers>
  <Passenger>
    <Newsletter>false</Newsletter>
    <PersonID>7a7c18a0-ad47-e011-9c71-00155d991421</PersonID>
    <FirstName>Jeff</FirstName>
    <LastName>Wharton</LastName>
  </Passenger>
  <PAX>
    <PAXID>868b8fde-068a-4468-b9b6-9660166e630d</PAXID>
    <FirstName>Lily</FirstName>
    <LastName>Wharton</LastName>
    <PaxOrder>3</PaxOrder>
  </PAX>
  <PAX>
    <PAXID>bb2770e0-b0e0-479f-8c03-1330ce87cb64</PAXID>
    <FirstName>Ben</FirstName>
    <LastName>Wharton</LastName>
    <PaxOrder>2</PaxOrder>
  </PAX>
</dsPassengers>'

Now I need to write the code that takes the XML in my variable and uses the nodes() and value() methods to shred my XML into relational data:

SELECT P.value('PersonID[1]', 'VARCHAR(50)') PersonID,
	P.value('FirstName[1]', 'VARCHAR(20)') FirstName,
	P.value('LastName[1]', 'VARCHAR(20)') LastName,
	P.value('Newsletter[1]', 'BIT') Newsletter             
FROM @xml.nodes('/dsPassengers/Passenger') AS Passenger(P)

SELECT X.value('PAXID[1]', 'VARCHAR(50)') PAXID,
	X.value('FirstName[1]', 'VARCHAR(20)') FirstName,
	X.value('LastName[1]', 'VARCHAR(20)') LastName,
	X.value('PaxOrder[1]', 'INT') PaxOrder,
	P.value('PersonID[1]', 'VARCHAR(50)') PersonID               
FROM @xml.nodes('/dsPassengers/PAX') AS PAX(X)
CROSS APPLY @xml.nodes('/dsPassengers/Passenger') as Passenger(P) 

That’s it. If I execute the above code, it will return the following:

image

Because XQuery has turned our XML data into a rowset, I can use the results in Transact-SQL statements in which rowset providers such as a table, view, or the OPENROWSET function can appear. For example:

SELECT CruiseCode             
FROM @xml.nodes('/dsPassengers/Passenger') AS Passenger(P)
INNER JOIN Booking.Actual BA ON 
	P.value('PersonID[1]', 'VARCHAR(50)') = BA.PersonID

UPDATE Person.Detail
SET FirstName = P.value('FirstName[1]', 'VARCHAR(20)'),
	LastName = P.value('LastName[1]', 'VARCHAR(20)'),
	Newsletter = P.value('Newsletter[1]', 'BIT')             
FROM @xml.nodes('/dsPassengers/Passenger') AS Passenger(P)
WHERE Person.Detail.PersonID = 
	P.value('PersonID[1]', 'VARCHAR(50)')

image_thumb2

Namespaces

In Part 2 – OpenXML: What is it and how do I use it? I discussed what affect the Namespace property of an ADO.NET Dataset has on shredding XML with the OpenXML Rowset Provider.  Well, it has a similar affect on XQuery.

Here is my XML with a Namespace

<dsPassengers xmlns="http://tempuri.org/dsPassengers.xsd">
  <Passenger>
    <Newsletter>false</Newsletter>
    <PersonID>7a7c18a0-ad47-e011-9c71-00155d991421</PersonID>
    <FirstName>Jeff</FirstName>
    <LastName>Wharton</LastName>
  </Passenger>
</dsPassengers>

I now need to account for this Namespace when I code my XQuery statement

;WITH XMLNAMESPACES(DEFAULT 'http://tempuri.org/dsPassengers.xsd')
SELECT P.value('(PersonID)[1]', 'VARCHAR(50)') PersonID,
	P.value('FirstName[1]', 'VARCHAR(20)') FirstName,
	P.value('LastName[1]', 'VARCHAR(20)') LastName,
	P.value('Newsletter[1]', 'BIT') Newsletter             
FROM @PassengerDetail.nodes('dsPassengers/Passenger') AS Passenger(P)

Nowhere near as many changes needed for XQuery compared to OpenXML.

Stored Procedure

So now that I know my code is working and I’ve removed all the ‘”Namespace” changes, let’s turn it into a Stored Procedure.

CREATE PROC [Person].[usp_UpdatePassengerDetailXQuery]
(           
	@PassengerDetail XML
)
AS

SET NOCOUNT ON 

UPDATE Person.Detail
SET FirstName = P.value('FirstName[1]', 'VARCHAR(20)'),
	LastName = P.value('LastName[1]', 'VARCHAR(20)'),
	Newsletter = P.value('Newsletter[1]', 'BIT')             
FROM @PassengerDetail.nodes('/dsPassengers/Passenger') AS Passenger(P)
WHERE Person.Detail.PersonID = 
	P.value('PersonID[1]', 'VARCHAR(50)') 

UPDATE Person.PAX
SET FirstName = X.value('FirstName[1]', 'VARCHAR(20)'),
	LastName = X.value('LastName[1]', 'VARCHAR(20)'),
	PaxOrder = X.value('PaxOrder[1]', 'INT'),
	PersonID = P.value('PersonID[1]', 'VARCHAR(50)')                
FROM @PassengerDetail.nodes('/dsPassengers/PAX') AS PAXT(X)
CROSS APPLY @PassengerDetail.nodes('/dsPassengers/Passenger') as Passenger(P) 
WHERE Person.PAX.PAXID = X.value('PAXID[1]', 'VARCHAR(50)') 

IF @@ERROR <> 0 
    -- Return 99 to the calling program to indicate failure.
    RETURN 99
ELSE
    -- Return 0 to the calling program to indicate success.
    RETURN 0

SET NOCOUNT OFF

I now have a stored procedure which accepts 1 parameter of type XML so let’s take a quick look at how I call it from a .Net application.

Putting It All Together

The .Net code I need to use to call my stored procedure is exactly the same as discussed in Part 2 – OpenXML: What is it and how do I use it?.  All I need to do is change the stored procedure name from [Person].[usp_UpdatePassengerDetail] to [Person].[usp_UpdatePassengerDetailXQuery]

Conclusion

As you can see from the above example, using XML to send multiple rows of data to SQL Server is very simple to implement. That said, there are a number of tips and tricks that can be used to limit the impact shredding XML data has on SQL Server and I’ll discuss these in Part 5 – OpenXML and XQuery Optimisation Tips.

If you want to learn more about XQuery, have a look at this series XQuery Labs – A Collection of XQuery Sample Scripts

Part 1 – Using TVP’s and XML to Transport Relational Data


Introduction

I’ve been using XML as a means of sending multiple rows of data to SQL Server ever since OpenXML was released in SQL Server 2000.  Although 2000 didn’t have an XML data type, the OpenXML Rowset Provider enabled SQL Server to shred a XML string into a rowset which could then be used in Transact-SQL statements in which rowset providers such as a table, view, or the OPENROWSET function can appear.  Gone were the days of looping through rows of data and sending each row to SQL Server one at a time.  Just when I thought things couldn’t get any better, Microsoft released SQL Server 2005.

One of the new features of 2005 was the introduction of the native XML data type. This new type allowed users to create tables that had one or more columns of type XML in addition to relational columns, create XML variables and parameters, and use a subset of the XQuery Language to query XML data.  The XML data type could also be used with the OpenXML Rowset Provider.  Developers now had two easy methods of sending multiple rows of data to SQL Server.

With the release of SQL Server 2008 came Table-valued parameters (TVP).  Table-valued parameters are declared by using user-defined table types and can be used to send multiple rows of data to a Transact-SQL statement or a routine, such as a stored procedure or function, without creating a temporary table or many parameters.  You guessed it; another method.

The Problem

Developers can now use a number of different methods for sending multiple rows of data to SQL Server and sometimes choosing the correct method to use can be frustrating.  Which one is faster?  Which one is easier to implement?  Can I use a mixture of both? Why isn’t there somewhere I can look for a comparison on all these technologies?  These are some of the questions I’ve been asking myself over the last couple of weeks.

What’s This Post All About?

I originally thought of documenting many of the things I discovered about these technologies in a word document to be stored away and potentially lost over the coming months.  I also thought about throwing it all away and sticking with my current method (OpenXML) as I found I had more question than answers and it was all starting to sound too difficult.  In the end I was convinced (by a colleague) to blog about my findings.

The purpose of this series of posts is to provide other developers, DBD’s and DBA’s with an insight into the above mentioned technologies and hopefully assist them with choosing and implementing a technology (or technologies) to use within their applications.  I’m also hoping that others will provide comment on their experiences with these technologies.

Scope

OpenXML:  What is it and how do I use it?
XQuery: Please don’t use OpenXML!
Table-Valued Parameters (TVP’s): Table what?
OpenXML and XQuery Optimisation Tips

Final Word

Estimated post dates:

OpenXML: 27th May 2011
XQuery: 28th May 2011
Table-Valued Parameters (TVP’s): 29th May 2011
OpenXML and XQuery Optimisation Tips: 12th June 2011

%d bloggers like this: