Replace the poorly performing SUBSTRING function in SQL Server with the LIKE statement

Brett Hawton – 8 December 2010

In a previous article I had shown that embedding an indexed column in a function prevented SQL server from performing an efficient index seek and instead resulted in a far slower index scan. I had shown many techniques to extricate the column out of the function so as to allow the index seek to occur on SQL Server.

During the PASS summit in Seattle I had some lively and enjoyable conversations with a number of DBA’s asking if this held true for the SUBSTRING function and also if taking the column out of the function was still beneficial if that column was not part of an index in the first place. I promised a battery of tests and here they are:

Before we get started, whenever I run comparisons like this I always like to raise the thread priority of SQL Server so that activities from other services and programs don’t cause inconsistent results:

Ok now with that done lets first compare a SUBSTRING function with the equivalent LIKE statement when an index is indeed present as we should see the same benefits the previous article had described in taking say WHERE SUBSTRING(Title,1,1) = ‘J’ and changing it to …WHERE Title LIKE ‘J%’:

First let’s create a relevant index on a table to suite our purposes:

USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX [AK_Employee_Title] ON [HumanResources].[Employee]
([Title] ASC)
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

The two query plans produced show that the LIKE statement is only slightly faster than the SUBSTRING function for an indexed column:

However it’s best to run the two TSQL statements in a loop many times (200,000) in order to see the exact difference in a real-world performance scenario:

USE [AdventureWorks]
GO
SET NOCOUNT ON
DECLARE @StartTime1 datetime, @StartTime2 datetime,
	@EndTime1 datetime, @EndTime2 datetime, @Counter int, @V int
SELECT @V = MAX(ManagerID) -- run it once before starting to cache everything needed
			FROM HumanResources.Employee
			WHERE SUBSTRING(Title,1,1) = 'J'
SELECT @StartTime1 = GETDATE(), @Counter = 0
WHILE @Counter < 200000 -- Run SUBSTRING 200,000 times
        BEGIN
        SELECT @V = MAX(ManagerID)
			FROM HumanResources.Employee
			WHERE SUBSTRING(Title,1,1) = 'J'
        SET @Counter = @Counter + 1
        END
SELECT @EndTime1 = GETDATE()

SELECT @StartTime2 = GETDATE(), @Counter = 0
WHILE @Counter < 200000 -- Run LIKE 200,000 times
        BEGIN
        SELECT @V = MAX(ManagerID)
			FROM HumanResources.Employee
			WHERE Title LIKE 'J%'
        SET @Counter = @Counter + 1
        END
SELECT @EndTime2 = GETDATE()
SELECT 'Run time of SUBSTRING' = DATEDIFF(ms, @StartTime1, @EndTime1),
        'Run time of LIKE' = DATEDIFF(ms, @StartTime2, @EndTime2),
        'Improvement of LIKE over SUBSTRING (covered)' = CONVERT(varchar(9), (CONVERT(DEC(6,2), ((DATEDIFF(ms, @StartTime1, @EndTime1) * 1.00) / DATEDIFF(ms, @StartTime2, @EndTime2))))) + ' X improvement'
GO

Rather than the few percent performance improvement that the query plans had indicated, the performance improvement when switching from SUBSTRING to LIKE for an indexed column on a non-covered query is almost three times faster:

So what about a covered query? Do the gains LIKE gets over SUBSTRING continue in that scenario?

According to the query plan, LIKE performs even better over a SUBSTRING on a covered query than it does when the query is not covered:

However as before, let’s run the two covered queries in a loop to check real-world performance:

USE [AdventureWorks]
GO
SET NOCOUNT ON
DECLARE @StartTime1 datetime, @StartTime2 datetime,
	@EndTime1 datetime, @EndTime2 datetime, @Counter int, @V int
SELECT @V = COUNT(*) -- run it once before starting to cache everything needed
			FROM HumanResources.Employee
			WHERE SUBSTRING(Title,1,1) = 'J'
SELECT @StartTime1 = GETDATE(), @Counter = 0
WHILE @Counter < 200000 -- Run SUBSTRING 200,000 times
        BEGIN
        SELECT @V = COUNT(*)
			FROM HumanResources.Employee
			WHERE SUBSTRING(Title,1,1) = 'J'
        SET @Counter = @Counter + 1
        END
SELECT @EndTime1 = GETDATE()

SELECT @StartTime2 = GETDATE(), @Counter = 0
WHILE @Counter < 200000 -- Run LIKE 200,000 times
        BEGIN
        SELECT @V = COUNT(*)
			FROM HumanResources.Employee
			WHERE Title LIKE 'J%'
        SET @Counter = @Counter + 1
        END
SELECT @EndTime2 = GETDATE()
SELECT 'Run time of SUBSTRING' = DATEDIFF(ms, @StartTime1, @EndTime1),
        'Run time of LIKE' = DATEDIFF(ms, @StartTime2, @EndTime2),
        'Improvement of LIKE over SUBSTRING (covered)' = CONVERT(varchar(9), (CONVERT(DEC(6,2), ((DATEDIFF(ms, @StartTime1, @EndTime1) * 1.00) / DATEDIFF(ms, @StartTime2, @EndTime2))))) + ' X improvement'
GO

As you can see below the performance increase that a LIKE statement gets over a SUBSTRING function has widened to 5 times faster when the query is covered!

So what about the performance of SUBSTRING vs LIKE when the column is not indexed?

First let’s drop that index we created:

USE [AdventureWorks]
GO
IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[HumanResources].[Employee]') AND name = N'AK_Employee_Title')
DROP INDEX [AK_Employee_Title] ON [HumanResources].[Employee] WITH ( ONLINE = OFF )
GO

Now, according to the query plans produced there will be no difference in speed between the LIKE and SUBSTRING:

But again don’t believe the execution plans. Run the two queries in that same loop to see the real-world performance and you will be surprised…

USE [AdventureWorks]
GO
SET NOCOUNT ON
DECLARE @StartTime1 datetime, @StartTime2 datetime,
	@EndTime1 datetime, @EndTime2 datetime, @Counter int, @V int
SELECT @V = COUNT(*) -- run it once before starting to cache everything needed
			FROM HumanResources.Employee
			WHERE SUBSTRING(Title,1,1) = 'J'
SELECT @StartTime1 = GETDATE(), @Counter = 0
WHILE @Counter < 200000 -- Run SUBSTRING 200,000 times
        BEGIN
        SELECT @V = COUNT(*)
			FROM HumanResources.Employee
			WHERE SUBSTRING(Title,1,1) = 'J'
        SET @Counter = @Counter + 1
        END
SELECT @EndTime1 = GETDATE()

SELECT @StartTime2 = GETDATE(), @Counter = 0
WHILE @Counter < 200000 -- Run LIKE 200,000 times
        BEGIN
        SELECT @V = COUNT(*)
			FROM HumanResources.Employee
			WHERE Title LIKE 'J%'
        SET @Counter = @Counter + 1
        END
SELECT @EndTime2 = GETDATE()
SELECT 'Run time of SUBSTRING' = DATEDIFF(ms, @StartTime1, @EndTime1),
        'Run time of LIKE' = DATEDIFF(ms, @StartTime2, @EndTime2),
        'Improvement of LIKE over SUBSTRING (covered)' = CONVERT(varchar(9), (CONVERT(DEC(6,2), ((DATEDIFF(ms, @StartTime1, @EndTime1) * 1.00) / DATEDIFF(ms, @StartTime2, @EndTime2))))) + ' X improvement'
GO

As the results below show, the LIKE statement is still vastly superior in speed (30% faster) to the SUBSTRING function even when there is no index on a column.

As this article has shown, there are vanishingly few cases where the performance of the LIKE statement does not completely eclipse the performance of the SUBSTRING function no matter if the column is indexed or not.

Note that these tests were run on SQL Server 2008 SP1.

Comments

Comment from MOHAN DEVAL
Time April 7, 2011 at 9:48 am

Good

Write a comment

You need to login to post comments!