Fun with SSIS and {GUID’s}


Task

Create a SQL Server Integration Services (SSIS) package that accepts a Globally Unique Identifier (GUID) as a variable and uses this variable as:

Problem

Although SSIS supports the GUID data type (DT_GUID) and although you can use a Data Conversion Transformation to transform a string representation of GUID into the DT_GUID data type, there is no GUID data type that can be assigned to SSIS variables.

“Not a problem” I hear you say, “Just use a STRING data type and SQL Server will implicitly convert it to a GUID”“You wish!” says SSIS

Solution

This should have been an easy solution to implement so I can’t believe I wasted an hour trying to get things working. So what caused all my pain? SSIS’s {perceived}lack of support for GUID’s variables.

Steps to implement solution:

1. Create a SSIS variable of type STRING

2. Add an Execute SQL Task to execute a SQL Server stored procedure using the SSIS variable as a parameter

3. Add a Data Flow Task

4. Add an OLE DB Source with a Data Access Mode of SQL Command and a SQL Command Text which uses the SSIS variable as a parameter

5. Add an OLE DB Destination to pipe the result from the OLE DB Source into a SQL Server table.

6. Run the package

7. Debugging

Step 1 – Create SSIS Variable

MSDN Link: How to: Add or Delete a Variable in a Package by Using the Variables Window

Here I’ve created a variable called ProjectUID of type STRING and assigned it a default value to assist with debugging.

image

Step 2 – Create Execute SQL Task

MSDN Link: Working with Parameters and Return Codes in the Execute SQL Task

You can see from the first image below that stored procedure usp_DeleteEMPImportTablesByProjectUID accepts a parameter called @ProjectUID (of data type UNIQUEIDENTIFIER) and it’s value has been set to “?”.  This tells the Execute SQL Task that a parameter mapping is being used to provide the value.

The second image shows how to map the SSIS variable created in step 1 to the @ProjectUID parameter

The connection manager created for this task is using the Native OLE DB\SQL Native Client provider.

image

image

Step 3 – Create Data Flow Task

MSND Link: Adding a Data Flow Task

Nothing add here

image

Step 4 – Add an OLE DB Source

MSDN Link: How to: Extract Data by Using the OLE DB Source

MSDN Link: How to: Map Query Parameters to Variables in a Data Flow Component

Here I add an OLE DB Source and set its Data Access Mode to SQL Command.  I have also added my SQL Statement which filters results by ProjectUID.  As with the Execute SQL Task, the “?” means that a parameter mapping is being used to provide the value.

Clicking on the Parameters button launches the Set Query Parameter dialog (image 2) where the SSIS variable can be mapped.

The connection manager used for this data source is the same as that used for the Execute SQL Task i.e. Native OLE DB\SQL Native Client provider.

image

image

Step 5 – Add an OLE DB Destination

MSDN Link: How to: Load Data by Using the OLE DB Destination

Nothing add here either

image

Step 6 – Run the Package

So I ran the package and as you can see from the images below, the OLE DD Source failed with the following error:

[OLE DB Source [5018]] Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80040E21. An OLE DB record is available.  Source: “Microsoft SQL Native Client”  Hresult: 0x80040E21  Description: “Invalid character value for cast specification”.

Looking into this further shows that it failed on casting the GUID string value (66EF3876-E8CF-45F2-B2D5-8DDFF60B46D3) to a GUID data type.  But how is this so as the Execute SQL Task succeeded and it’s using the same value?

imageimage

Step 7 – Debugging

The first step in the debugging process was to remove the parameter mapping for the OLE DB Source and use the GUID value directly:

WHERE     (ea.ProjectUID = ’66EF3876-E8CF-45F2-B2D5-8DDFF60B46D3′)

As you can see, the package runs successfully.  This is what had me stumped for a while.

Anyway, to cut a long storey short, a STRING representation of a GUID needs to be formatted as {66EF3876-E8CF-45F2-B2D5-8DDFF60B46D3} i.e. enclose the GUID in curly brackets (which are optional by the way)

This format is accepted by both the Execute SQL Task and the OLE DB Source.

imageimage

Conclusion

I’m not sure why the curly brackets are required for an OLE DB Source and not an Execute SQL Task (especially when they’re using the exact same provider) as I haven’t researched it yet.

Free SQL Server 2008 R2 Training


There’s been an update to the SQL Server 2008 R2 Developers Training Kit (May 2011).

This training kit is a great resource for developers, trainers, consultants and evangelists who need to understand SQL Server 2008 and SQL Server 2008 R2 improvements from a developer perspective. It contains a rich set of presentations, demos, hands-on labs and videos that are perfect for self-paced learning or for conducting your own training.

There’s also some SQL Server Developer content in the Visual Studio 2010 and .NET Framework 4 Training Course.

In this introductory unit, the Visual Studio 2010 and .NET Framework 4 evangelism team welcomes you to the training course and goes over the significant changes to Visual Studio 2010 and the .NET Framework 4. We’ll also cover what you’ll see in the course and how to follow along with the labs.

If you’re into Business Intelligence, check out the SQL Server 2008 R2 Business Intelligence Training Course. (a.k.a. SQLInsights).

The SQL Server 2008 R2 Business Intelligence Training Course (SQLInsights) will help you learn how to build your first Microsoft Business Intelligence solution. SQLInsights is a great resource for developers, consultants and evangelists who need to learn the basics of how to build solutions that deliver integrated reporting, analytics and dashboard experiences. It contains a rich set of videos and hands-on labs that are perfect for self-paced learning.

Sargable Predicates


After a small slip-up on the SQL Server Forums, I thought it best to remind myself (and others) of the importance of using Sargable predicates in WHERE clauses.

Definition: In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able (Wikipedia, 2011, en.wikipedia.org/wiki/Sargable)

So what does that actually mean? This can be answered with a couple of very simple examples however there are two terms we need to look at first; Index Scan and Index Seek.

Index Scan: An Index Scan touches every row in a table whether or not it qualifies, thus its cost is proportional to the total number of rows in the table.  A scan is an efficient strategy if the table is small or if most of the rows in a table qualify for the predicate.

Index Seek: An Index Seek only touches rows that qualify and pages that contain these qualifying rows, thus its cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table. A seek is generally preferred for highly selective queries i.e. the query is requesting a small number of rows.

This is where the importance of sargable predicates come into play. 

If a WHERE clause predicate is not sargable, the Query Optimizer will ignore any indexes in the clause and force an Index Scan.  If a WHERE clause predicate is sargable, the Query Optimizer will try and utilise any indexes in the clause hopefully resulting in an Index Seek and faster query performance.. 

Let’s look at a couple of examples.

Example 1

Requirement: Return a list of bookings where the cruise departure date is at least 21 days away.

Typical Solution: Use the DATEDIFF function to return the number of days between the current date and departure date and then validate if the returned value is >= 21.

Better Solution: Use the DATEADD function to add 21 days to the current date and then check if the departure date is >=  to the date returned from the DATEADD function.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DATEDIFF(day, GETDATE(),DepartureDate) >= 21

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DepartureDate >= DATEADD(day,21,CAST(GETDATE() AS DATE))

image

Both queries return the same number of rows (6,787 out of 53,613) however Query 1 uses an Index Scan and

Query 2 uses a Index Seek.  Since a scan touches every row in a table (in this case 53,613) and a seek only touches qualifying rows (in this case 6,787), Query 1 had a 7% cost penalty over Query 2.

Example 2

This is the one that always gets me (and the one that tripped me up on the forums) as there’s a perception that using LIKE in a WHERE clause is bad news.

Requirement: Return a list of bookings where the cruise name starts with Hawaii.

Typical Solution: Use the SUBSTRING function to return the first 6 characters of the cruise name and then validate if these 6 characters equal Hawaii.

Better Solution: Use the LIKE operand with the % wildcard character to validate if the cruise name starts with Hawaii.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE SUBSTRING(CruiseName,1,6) = 'Hawaii'

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE CruiseName LIKE 'Hawaii%'

image

Both queries return the same number or rows (452 out of 53613) and like example 1, Query 1 uses an Index Scan and Query 2 uses a Index seek.  In this example though, Query 1 resulted in a 11% cost penalty over Query 2. 

Given these results, what do you think would happen if we doubled the number of cruises in the Cruise.Detail table and would it make any difference if we increased the number of data matches i.e. added more cruises containing Hawaii compared to adding more cruises not containing Hawaii?

  Query 1 (Non-sargable predicate) Query 2 (Sargable predicate)

Hawaii added

904 out of 10916

image image

No Hawaiii added

452 out of 10916

image image

As you can see, the cost penalty for Query 1 has increased substantially as the query optimizer has ignored the IX_CruiseName index and forced a scan.  The increase is even more pronounced when there are less data matches so imagine what the performance hit would be if this table contained 1+ million rows.

Example 3

Requirement: Return a list of booking id’s where the booked cruise departs in 2010.

Typical Solution: Use the YEAR function to convert the departure date into a year and then validate if the returned year equals 2010.

Better Solution: Use the BETWEEN logical operator to validate if the departure date is between the dates  2010/01/01 and 2010/12/31.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE Year(DepartureDate) = 2010

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DepartureDate BETWEEN CAST('2010/01/01' AS DATE) 
	AND CAST('2010/12/31' AS DATE)

image

Once again both queries return the same number or rows (20643 out of 53613) and like the other examples, Query 1 uses an Index Scan and Query 2 uses a Index seek. In this example, Query 1 resulted in a 11% cost penalty over Query 2.

Example 4

This final example is based on Data Type Precedence.

When an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.  Therefore, if a column in a WHERE clause is converted to a different data type, any indexes on that column cannot be used thus forcing an Index Scan.

Requirement: Return a list of bookings for people who have a home phone area code of 02 (In this example, the home phone area code column is of type VARCHAR to facilitate the storing of a leading zero)

Typical Solution: Compare the home phone area code with the numeric value of 02

Better Solution: Compare the home phone area code with the string value of ‘02’

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Person.Detail PD
ON PD.PersonID = BR.PersonID
WHERE HomeAreaCode = 02

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Person.Detail PD
ON PD.PersonID = BR.PersonID
WHERE HomeAreaCode = '02'

image

This is a very simple example however it highlights the importance of choosing the correct data type for your columns.

Conclusion

In this post I’ve illustrated the importance of using Sargable predicates and hopefully provided you with an insight into how to ensure that your queries perform optimally.

Part 5 – OpenXML and XQuery Optimisation Tips


Introduction

This is the fifth and final post in a series of five on Using TVP’s and XML to Transport Relational Data.

In this post we’ll take a look at some tips and tricks used to increase OpenXML and XQuery performance.

Refresher

The purpose behind this series of posts is to provide .Net Developers with an insight into three technologies that can be used to send multiple rows of data to SQL Server 2008(R2).

In Part 2 and Part 3 we looked at how to use the OpenXML Rowset Provider and XQuery (respectively) to shred an XML document into a rowset, and in Part 4 we looked at Microsoft’s latest offering; Table-Valued Parameters (TVP’s). We also looked at how easy it is to swap between these technologies when using .Net Framework, ADO.NET Datasets and Microsoft’s Enterprise Library 5.0.

OpenXML

I could only find two real performance tips relating to OpenXML (three if you count don’t use it!)

1) A parsed document is stored in the internal cache of SQL Server and the MSXML parser (Msxmlsql.dll) uses one-eighth the total memory available for SQL Server so to avoid running out of memory, execute sp_xml_removedocument as soon as possible.

2) If you need to use the output of the OpenXML function several times in your procedure, populate a table variable with the shredded output and use the table variable instead of using the OpenXML function over and over again.

XQuery

The following performance tips where sourced from Bob Beauchemin’s SQL Server 2008 Microsoft Certified Master (MCM) XML readiness video.

1) Queries that use parent axis generate extra query plan steps so use multiple CROSS APPLY steps to get nodes at multiple nesting levels rather than using the parent axis.

Bad: select o.value('../@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer/orders') as N(o) Good: select c.value('@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)

2) Move ordinals to the end of path expressions

Bad: 
/book[1]/@isbn

Good: 
(/book/@isbn)[1]

3) Avoid predicates in the middle of path expressions

Bad: 
book[@ISBN = "1-8610-0157-6"]/author[first-name = "Davis"]

Good: 
/book[@ISBN = "1-8610-0157-6"] "n" /book/author[first-name = "Davis"] 

4) Use context item in predicate to lengthen path in exist()

Bad:
SELECT * 
FROM docs 
WHERE 1 = xCol.exist('/book/subject[text() = "security"]') 

Good:
SELECT * 
FROM docs 
WHERE 1 = xCol.exist('/book/subject/text()[. = "security"]') 

5) Casting from XML to SQL

BAD: CAST( CAST(xmldoc.query('/a/b/text()') as nvarchar(500)) as int)
GOOD: xmldoc.value('(/a/b/text())[1]', 'int')

BAD: node.query('.').value('@attr', 'nvarchar(50)')
GOOD: node.value('@attr', 'nvarchar(50)')

6) Use temp table (insert into #temp select … from nodes()) or Table-valued parameter instead of XML

7) Specify a single root node in query as the optimizer assumes that XML can be a fragment.

OpenXML vs. XQuery Performance

There are a ton of forum threads and blog posts out there which have compared the performance of OpenXML and XQuery so I’m not going to post any of my own tests. What I will say though is that OpenXML appears to be faster than XQuery for large documents but slower for small documents. Given this, you’ll probably end up using a mixture of both OpenXML and XQuery (and maybe even a bit of Table-valued Parameters) in your applications.

Exam 70-454: Transition Your MCITP SQL Server 2005 DBD to MCITP SQL Server 2008 DBD


Just a quick post on some of the resources I’ve been using to prepare for exam 70-454.

SQL Server 2008 Microsoft Certified Master (MCM) Readiness Videos

Although these readiness videos were designed to provide SQL Server 2008 Microsoft Certified Master candidates with an overview of what the MCM covers, they also contain a wealth of information on content covered in other SQL Server 2008 certifications.  Each module also provides a comprehensive list of additional readiness resources such as whitepapers, blog posts, other videos, and applicable SQL Server Books Online sections.

This web page contains links to further study aids for each of the online training modules.

Training Kits

SQL Server 2008 R2 Developers Training Kit

Virtual Labs

Webcasts

Microsoft e-Learning/Tutorials

Books

Articles

    SelfTest Software

%d bloggers like this: