Sargable Predicates

After a small slip-up on the SQL Server Forums, I thought it best to remind myself (and others) of the importance of using Sargable predicates in WHERE clauses.

Definition: In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able (Wikipedia, 2011, en.wikipedia.org/wiki/Sargable)

So what does that actually mean? This can be answered with a couple of very simple examples however there are two terms we need to look at first; Index Scan and Index Seek.

Index Scan: An Index Scan touches every row in a table whether or not it qualifies, thus its cost is proportional to the total number of rows in the table.  A scan is an efficient strategy if the table is small or if most of the rows in a table qualify for the predicate.

Index Seek: An Index Seek only touches rows that qualify and pages that contain these qualifying rows, thus its cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table. A seek is generally preferred for highly selective queries i.e. the query is requesting a small number of rows.

This is where the importance of sargable predicates come into play. 

If a WHERE clause predicate is not sargable, the Query Optimizer will ignore any indexes in the clause and force an Index Scan.  If a WHERE clause predicate is sargable, the Query Optimizer will try and utilise any indexes in the clause hopefully resulting in an Index Seek and faster query performance.. 

Let’s look at a couple of examples.

Example 1

Requirement: Return a list of bookings where the cruise departure date is at least 21 days away.

Typical Solution: Use the DATEDIFF function to return the number of days between the current date and departure date and then validate if the returned value is >= 21.

Better Solution: Use the DATEADD function to add 21 days to the current date and then check if the departure date is >=  to the date returned from the DATEADD function.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DATEDIFF(day, GETDATE(),DepartureDate) >= 21

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DepartureDate >= DATEADD(day,21,CAST(GETDATE() AS DATE))

image

Both queries return the same number of rows (6,787 out of 53,613) however Query 1 uses an Index Scan and

Query 2 uses a Index Seek.  Since a scan touches every row in a table (in this case 53,613) and a seek only touches qualifying rows (in this case 6,787), Query 1 had a 7% cost penalty over Query 2.

Example 2

This is the one that always gets me (and the one that tripped me up on the forums) as there’s a perception that using LIKE in a WHERE clause is bad news.

Requirement: Return a list of bookings where the cruise name starts with Hawaii.

Typical Solution: Use the SUBSTRING function to return the first 6 characters of the cruise name and then validate if these 6 characters equal Hawaii.

Better Solution: Use the LIKE operand with the % wildcard character to validate if the cruise name starts with Hawaii.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE SUBSTRING(CruiseName,1,6) = 'Hawaii'

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE CruiseName LIKE 'Hawaii%'

image

Both queries return the same number or rows (452 out of 53613) and like example 1, Query 1 uses an Index Scan and Query 2 uses a Index seek.  In this example though, Query 1 resulted in a 11% cost penalty over Query 2. 

Given these results, what do you think would happen if we doubled the number of cruises in the Cruise.Detail table and would it make any difference if we increased the number of data matches i.e. added more cruises containing Hawaii compared to adding more cruises not containing Hawaii?

  Query 1 (Non-sargable predicate) Query 2 (Sargable predicate)

Hawaii added

904 out of 10916

image image

No Hawaiii added

452 out of 10916

image image

As you can see, the cost penalty for Query 1 has increased substantially as the query optimizer has ignored the IX_CruiseName index and forced a scan.  The increase is even more pronounced when there are less data matches so imagine what the performance hit would be if this table contained 1+ million rows.

Example 3

Requirement: Return a list of booking id’s where the booked cruise departs in 2010.

Typical Solution: Use the YEAR function to convert the departure date into a year and then validate if the returned year equals 2010.

Better Solution: Use the BETWEEN logical operator to validate if the departure date is between the dates  2010/01/01 and 2010/12/31.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE Year(DepartureDate) = 2010

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DepartureDate BETWEEN CAST('2010/01/01' AS DATE) 
	AND CAST('2010/12/31' AS DATE)

image

Once again both queries return the same number or rows (20643 out of 53613) and like the other examples, Query 1 uses an Index Scan and Query 2 uses a Index seek. In this example, Query 1 resulted in a 11% cost penalty over Query 2.

Example 4

This final example is based on Data Type Precedence.

When an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.  Therefore, if a column in a WHERE clause is converted to a different data type, any indexes on that column cannot be used thus forcing an Index Scan.

Requirement: Return a list of bookings for people who have a home phone area code of 02 (In this example, the home phone area code column is of type VARCHAR to facilitate the storing of a leading zero)

Typical Solution: Compare the home phone area code with the numeric value of 02

Better Solution: Compare the home phone area code with the string value of ‘02’

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Person.Detail PD
ON PD.PersonID = BR.PersonID
WHERE HomeAreaCode = 02

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Person.Detail PD
ON PD.PersonID = BR.PersonID
WHERE HomeAreaCode = '02'

image

This is a very simple example however it highlights the importance of choosing the correct data type for your columns.

Conclusion

In this post I’ve illustrated the importance of using Sargable predicates and hopefully provided you with an insight into how to ensure that your queries perform optimally.

Advertisements

About Mr. Wharty's Ramblings
Jeff holds a Master’s Degree in Systems Development (.NET Stream) and a Master’s Degree in Database Design and Management (SQL Server). Jeff also holds MCPD, MCITP, MCDBA, MCSD and MCSE certifications.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: