There are a few things that are off in this query. Going through them in some sense of order:
I am still a fan of 'old style' joins, but, the CTE concept is a real winner in SQL Server, so, instead of creating the table variable, just use the CTE.
I can tell that your code went through some iterations, and as a result, you have some 'cruft' that can be disposed of. Delete those things that you don't select (like you select min(A.CreationDate) as AnswerTime
which is never used.
The SEDE tables take some getting used to. You have done a CharIndex on the tags column. That's fine, but, your example php also pulls in php5 tags, as well as cakephp. To get around this, you should be aware that SEDE stores the tag names in <...>
braces in the Tags field, so searching for <php>
is what you want.
Using the CharIndex is OK, but more standard will be using like
, as in: ... and Tags like '%<php>%'
I have actally given up on using the Tags field completely, and instead I do the join to the PostTags table which allows you to do the join natively without the like/charindex.
There is no need for the aggregate column on the Question's CreationDate. The max(Q.CreationDate)
is redundant (there will only be one CreationDate per Question ID).
You should always use as many constants as you can when doing SQL Queries (at least as much as you can, until you find a performance reason to not overdo it....), so you should also add the A.PostTypeId = 2
even though its ParentID is the Q's ID.
I did not like the way you were 'Slotting' the data in to months. I find that subtracting the days-in-month-less-one from the given date brings the date back to the first-of-the-month, and then truncating the time, allows you to have all dates represented by exactly the first-of-the-month:
Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate)) as CreationMonth
Getting the Variable substitution to work was easier once I used the Tags table too. The stupid 'Unexpected error, blame has already been assigned' error is a PITA, but it is now gone.
Finally, the scale of the Y axis was so huge, that the count of the questions was not visibly appealing/apparent. By changing the scale from Seconds to Hours, the scales are better.
I forked your query here, and this is the SQL (I have tried to keep the capitalization and other style conventions consistent with your code):
declare @Language nvarchar(25) = ##LanguageTag:string##;
declare @tagid int;
select @tagid = Id
from Tags
where TagName = @Language
print @Language + '->' + Convert(NVarchar(max), @tagid);
with AnswerTime as (
Select Q.Id,
Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate)) as CreationMonth,
Convert(float, DATEDIFF(second, Q.CreationDate, min(A.CreationDate))) AS Elapsed
From PostTags T,
Posts Q,
Posts A
Where T.TagId = @tagid
and T.PostId = Q.Id
and Q.PostTypeId = 1
and A.PostTypeId = 2
and Q.ClosedDate is null
and Q.Id = A.ParentId
Group by Q.Id,
Q.CreationDate,
Convert(Date, DateAdd(day, 1 - DatePart(day, Q.CreationDate), Q.CreationDate))
)
Select CreationMonth,
count(*) as [Count of Tickets],
avg(Elapsed)/3600.0 as [Average hours to first Answer]
From AnswerTime
Group by CreationMonth
Edit/Update
I ran the query on Stack Overflow, and it fails because there are so many questions that the avg(Elapsed)
fails with an arithmetic overflow. I have updated the query to only consider the past 68 years worth of data... (in case there are large date differences in seconds), and also to convert the elapsed in to a float (which does not overflow on avg()
)