Part of your initial query is as follows.
FROM [dbo].[calendar] a
LEFT JOIN [dbo].[colleagueList] b
ON b.[Date] = a.d
WHERE DAY(a.[d]) = 1
AND a.[d] BETWEEN @dateStart AND COALESCE(@dateEnd,@dateStart)
That section of the plan is shown below

Your revised query BETWEEN @dateStart AND ISNULL(@dateEnd,@dateStart)
has this for the same join

The difference seems to be that ISNULL
simplifies further and as a result you get more accurate cardinality statistics going into the next join. This is an inline table valued function and you are calling it with literal values so it can do something like.
a.[d] BETWEEN @dateStart AND ISNULL(@dateEnd,@dateStart)
a.[d] BETWEEN '2013-06-01' AND ISNULL(NULL,'2013-06-01')
a.[d] BETWEEN '2013-06-01' AND '2013-06-01'
a.[d] = '2013-06-01'
And as there is an equi join predicate b.[Date] = a.d
the plan also shows an equality predicate b.[Date] = '2013-06-01'
. As a result the cardinality estimate of 28,393
rows is likely to be pretty accurate.
For the CASE
/COALESCE
version when @dateStart
and @dateEnd
are the same value then it simplifies OK to the same equality expression and gives the same plan but when @dateStart = '2013-06-01'
and @dateEnd IS NULL
it only goes as far as
a.[d]>='2013-06-01' AND a.[Date]<=CASE WHEN (1) THEN '2013-06-01' ELSE NULL END
which it also applies as an implied predicate on ColleagueList
. The estimated number of rows this time is 79.8
rows.
The next join along is
LEFT JOIN colleagueTime
ON colleagueTime.TC_DATE = colleagueList.Date
AND colleagueTime.ASSOC_ID = CAST(colleagueList.ID AS VARCHAR(10))
colleagueTime
is a 3,249,590
row table which is (again) apparently a heap with no useful indexes.
This discrepancy in estimates affects the join choice used. The ISNULL
plan chooses a hash join that just scans the table once. The COALESCE
plan chooses a nested loops join and estimates that it will still just need to scan the table once and be able to spool the result and replay it 78 times. i.e. it estimates that the correlated parameters will not change.
From the fact that the nested loops plan was still going after two hours this assumption of a single scan against colleagueTime
seems likely to be highly inaccurate.
As for why the estimated number of rows between the two joins are so much lower I'm not sure without being able to see the statistics on the tables. The only way I managed to skew the estimated row counts that much in my testing was adding a load of NULL
rows (this reduced the estimated row count even though the actual number of rows returned remained the same).
The estimated row count in the COALESCE
plan with my test data was in the order of
number of rows matching >= condition * 30% * (proportion of rows in the table not null)
Or in SQL
SELECT 1E0 * COUNT([Date]) / COUNT(*) * ( COUNT(CASE
WHEN [Date] >= '2013-06-01' THEN 1
END) * 0.30 )
FROM [dbo].[colleagueList]
but this does not square with your comment that the column has no NULL
values.
CASE
withCOALESCE(@dateEnd,@dateStart)
, does the issue still appear? – ypercube Jul 17 at 0:52ISNULL()
? – ypercube Jul 17 at 0:58SELECT task_state FROM sys.dm_os_tasks WHERE session_id = x
show? If it spends a lot of time not in theRUNNING
state what wait types is that session getting insys.dm_os_waiting_tasks
? – Martin Smith Jul 17 at 6:50COALESCE
.ISNULL
fixed it. – FreshPrinceOfSO Jul 17 at 12:57