SQL Server Index Fragmentation Script


Just sharing a cool script I stumbled across while searching for SQL Server index fragmentation scripts (Omer van Kloeten’s Blog). I’ve cleaned it up a bit and turned it into a Stored Procedure so that I can execute it as a SQL Server Agent Job (Not sure if this is a good idea or not so feel free to leave comments).

So why do we need to worry about Index Fragmentation?

*The SQL Server Database Engine automatically maintains indexes whenever insert, update, or delete operations are made to the underlying data. Over time these modifications can cause the information in the index to become scattered in the database (fragmented). Fragmentation exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file. Heavily fragmented indexes can degrade query performance and cause your application to respond slowly. For more information, see this Microsoft Web site.

Please Note: Defragmentation does not yield performance gains in every case. Every scenario is different. Similarly, determining when you should run the defragmentation statements requires analysis.

*Reorganizing and Rebuilding Indexes (http://technet.microsoft.com/en-us/library/ms189858.aspx)

MSDN Gallery: http://code.msdn.microsoft.com/SQL-Server-Index-6c12e7d4

CREATE PROC usp_FixIndexFragmentation
(
    /* Use online rebuild if detected  edition is Enterprise, Developer or Evaluation.*/
    @Online BIT = 1,

    /* Log recommendations Only */    
    @LogOnly BIT = 1,

    /* Specifies a percentage that indicates how full the Database Engine should 
    make the leaf level of each index page during index creation or alteration. 
    Fillfactor must be an integer value from 1 to 100. The default is 0. */
    @FillFactor TINYINT = 50,

   /* Specifies the scan level that is used to obtain statistics. Valid inputs are 
    DEFAULT, NULL, LIMITED, SAMPLED, or DETAILED. */
    @Index_physical_stats_mode VARCHAR(20) = 'Limited',


    /* Total number of index or data pages. */
    @Index_physical_stats_page_count BIGINT = 500,


    /* Logical fragmentation for indexes, or extent fragmentation for heaps in the 
    IN_ROW_DATA allocation unit. The value is measured as a percentage and 
    takes into account multiple files */
    @Avg_fragmentation_percent TINYINT = 40
)
AS
BEGIN
    SET NOCOUNT ON

    DEClARE @Rows BIGINT = 0
    DEClARE @Counter BIGINT = 1
    DECLARE @TSQLStatement NVARCHAR(MAX)

    IF OBJECT_ID('tempdb..#TmpTable') IS NOT NULL 
        DROP TABLE #TmpTable 
          
    CREATE TABLE #TmpTable 
    (
        RowNo BIGINT IDENTITY(1,1), TSQLStatement VARCHAR(MAX)
    )
          
    IF NOT EXISTS 
    (
        SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[LogTable]') 
        AND type in (N'U')
    )
    BEGIN
        CREATE TABLE [dbo].[LogTable]
        (
            [TSQLStatement] [varchar](max) NULL,
            [FRAGMENTATION] [float] NULL,
            [Executed] [datetime] NULL,
            [ExecutedBy] [varchar](100) NULL
        ) ON [PRIMARY]
    END
                    
    INSERT INTO LogTable
    OUTPUT INSERTED.TSQLStatement INTO #TmpTable
    SELECT    'ALTER INDEX [' + ix.name + '] ON [' + s.name + '].[' + t.name + '] ' +
    CASE
        WHEN ps.avg_fragmentation_in_percent > @Avg_fragmentation_percent THEN
        CASE
            WHEN @Online = 1 AND 
                (
                    CAST(SERVERPROPERTY('Edition') AS VARCHAR) LIKE 'Enterprise%'
                    OR
                    CAST(SERVERPROPERTY('Edition') AS VARCHAR) LIKE 'Developer%'

                    OR
                    CAST(SERVERPROPERTY('Edition') AS VARCHAR) LIKE 'Evaluation%'
                )
                THEN 'REBUILD WITH (ONLINE = ON, FILLFACTOR = ' 
                    + CAST(@FillFactor AS VARCHAR(10)) + ') '
            ELSE 'REBUILD WITH (FILLFACTOR = ' 
                + CAST(@FillFactor AS VARCHAR(10)) + ') '
        END
        ELSE 'REORGANIZE '
    END +
    CASE
        WHEN pc.partition_count > 1 
            THEN ' PARTITION = ' + CAST(ps.partition_number AS NVARCHAR(MAX))
        ELSE ''
    END,
    PS.AVG_FRAGMENTATION_IN_PERCENT, GETDATE(), SUSER_NAME()
    FROM sys.indexes AS ix INNER JOIN sys.tables t
        ON t.object_id = ix.object_id     INNER JOIN sys.schemas s
        ON t.schema_id = s.schema_id INNER JOIN 
        (         
            SELECT object_id, index_id, avg_fragmentation_in_percent, partition_number
            FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, 
                @Index_physical_stats_mode) 
            WHERE page_count > @Index_physical_stats_page_count
        ) ps
            ON t.object_id = ps.object_id AND ix.index_id = ps.index_id INNER JOIN 
    (
        SELECT object_id, index_id, COUNT(DISTINCT partition_number) AS partition_count
        FROM sys.partitions
        GROUP BY  object_id, index_id
    ) pc
            ON t.object_id = pc.object_id AND ix.index_id = pc.index_id
    WHERE ps.avg_fragmentation_in_percent > 10 AND ix.name IS NOT NULL
    ORDER BY  ps.avg_fragmentation_in_percent DESC

    SET @Rows = @@ROWCOUNT

    IF @LogOnly = 0
    BEGIN
        WHILE (@Counter <= @Rows)
            BEGIN
            SELECT @TSQLStatement = TSQLStatement
            FROM #TmpTable
            WHERE RowNo = @Counter

            EXECUTE sp_executesql @TSQLStatement

            SET @Counter += 1
            END
    END
    
  SET NOCOUNT OFF
END

Sargable Predicates


After a small slip-up on the SQL Server Forums, I thought it best to remind myself (and others) of the importance of using Sargable predicates in WHERE clauses.

Definition: In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able (Wikipedia, 2011, en.wikipedia.org/wiki/Sargable)

So what does that actually mean? This can be answered with a couple of very simple examples however there are two terms we need to look at first; Index Scan and Index Seek.

Index Scan: An Index Scan touches every row in a table whether or not it qualifies, thus its cost is proportional to the total number of rows in the table.  A scan is an efficient strategy if the table is small or if most of the rows in a table qualify for the predicate.

Index Seek: An Index Seek only touches rows that qualify and pages that contain these qualifying rows, thus its cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table. A seek is generally preferred for highly selective queries i.e. the query is requesting a small number of rows.

This is where the importance of sargable predicates come into play. 

If a WHERE clause predicate is not sargable, the Query Optimizer will ignore any indexes in the clause and force an Index Scan.  If a WHERE clause predicate is sargable, the Query Optimizer will try and utilise any indexes in the clause hopefully resulting in an Index Seek and faster query performance.. 

Let’s look at a couple of examples.

Example 1

Requirement: Return a list of bookings where the cruise departure date is at least 21 days away.

Typical Solution: Use the DATEDIFF function to return the number of days between the current date and departure date and then validate if the returned value is >= 21.

Better Solution: Use the DATEADD function to add 21 days to the current date and then check if the departure date is >=  to the date returned from the DATEADD function.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DATEDIFF(day, GETDATE(),DepartureDate) >= 21

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DepartureDate >= DATEADD(day,21,CAST(GETDATE() AS DATE))

image

Both queries return the same number of rows (6,787 out of 53,613) however Query 1 uses an Index Scan and

Query 2 uses a Index Seek.  Since a scan touches every row in a table (in this case 53,613) and a seek only touches qualifying rows (in this case 6,787), Query 1 had a 7% cost penalty over Query 2.

Example 2

This is the one that always gets me (and the one that tripped me up on the forums) as there’s a perception that using LIKE in a WHERE clause is bad news.

Requirement: Return a list of bookings where the cruise name starts with Hawaii.

Typical Solution: Use the SUBSTRING function to return the first 6 characters of the cruise name and then validate if these 6 characters equal Hawaii.

Better Solution: Use the LIKE operand with the % wildcard character to validate if the cruise name starts with Hawaii.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE SUBSTRING(CruiseName,1,6) = 'Hawaii'

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE CruiseName LIKE 'Hawaii%'

image

Both queries return the same number or rows (452 out of 53613) and like example 1, Query 1 uses an Index Scan and Query 2 uses a Index seek.  In this example though, Query 1 resulted in a 11% cost penalty over Query 2. 

Given these results, what do you think would happen if we doubled the number of cruises in the Cruise.Detail table and would it make any difference if we increased the number of data matches i.e. added more cruises containing Hawaii compared to adding more cruises not containing Hawaii?

  Query 1 (Non-sargable predicate) Query 2 (Sargable predicate)

Hawaii added

904 out of 10916

image image

No Hawaiii added

452 out of 10916

image image

As you can see, the cost penalty for Query 1 has increased substantially as the query optimizer has ignored the IX_CruiseName index and forced a scan.  The increase is even more pronounced when there are less data matches so imagine what the performance hit would be if this table contained 1+ million rows.

Example 3

Requirement: Return a list of booking id’s where the booked cruise departs in 2010.

Typical Solution: Use the YEAR function to convert the departure date into a year and then validate if the returned year equals 2010.

Better Solution: Use the BETWEEN logical operator to validate if the departure date is between the dates  2010/01/01 and 2010/12/31.

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE Year(DepartureDate) = 2010

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Cruise.Detail CD
ON CD.CruiseCode = BR.CruiseCode
WHERE DepartureDate BETWEEN CAST('2010/01/01' AS DATE) 
	AND CAST('2010/12/31' AS DATE)

image

Once again both queries return the same number or rows (20643 out of 53613) and like the other examples, Query 1 uses an Index Scan and Query 2 uses a Index seek. In this example, Query 1 resulted in a 11% cost penalty over Query 2.

Example 4

This final example is based on Data Type Precedence.

When an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.  Therefore, if a column in a WHERE clause is converted to a different data type, any indexes on that column cannot be used thus forcing an Index Scan.

Requirement: Return a list of bookings for people who have a home phone area code of 02 (In this example, the home phone area code column is of type VARCHAR to facilitate the storing of a leading zero)

Typical Solution: Compare the home phone area code with the numeric value of 02

Better Solution: Compare the home phone area code with the string value of ‘02’

Query 1 (Non-sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Person.Detail PD
ON PD.PersonID = BR.PersonID
WHERE HomeAreaCode = 02

image

Query 2 (Sargable predicate)

SELECT BR.OzBookingID
FROM Booking.Request BR INNER JOIN Person.Detail PD
ON PD.PersonID = BR.PersonID
WHERE HomeAreaCode = '02'

image

This is a very simple example however it highlights the importance of choosing the correct data type for your columns.

Conclusion

In this post I’ve illustrated the importance of using Sargable predicates and hopefully provided you with an insight into how to ensure that your queries perform optimally.

%d bloggers like this: