T-SQL get row count before TOP is applied

Question

I have a SELECT that can return hundreds of rows from a table (table can be ~50000 rows). My app is interested in knowing the number of rows returned, it means something important to me, but it actually uses only the top 5 of those hundreds of rows. What I want to do is limit the SELECT query to return only 5 rows, but also tell my app how many it would have returned (the hundreds). This is the original query:

SELECT id, a, b, c FROM table WHERE a < 2

Here is what I came up with - a CTE - but I don't feel comfortable with the total row count appearing in every column. Ideally I would want a result set of the TOP 5 and a returned parameter for the total row count.

WITH Everything AS
(
   SELECT id, a, b, c FROM table
),
DetermineCount AS
(
   SELECT COUNT(*) AS Total FROM Everything 
)
SELECT TOP (5) id, a, b, c, Total
FROM Everything 
CROSS JOIN DetermineCount;

Can you think of a better way?

Is there a way in T-SQl to return the affected row count of a select top query before the top was applied? @@rowcount would return 5 but I wonder if there is a @@rowcountbeforetop sort of thing.

Thanks in advance for your help.

** Update **

This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant.

-- @count is passed in as an out param to the stored procedure

CREATE TABLE dbo.#everything (id int, a int, b int, c int);

INSERT INTO #everything 
SELECT id, a, b, c FROM table WHERE a < 2;      

SET @count = @@rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;

What's wrong with executing two queries? One with the SELECT COUNT(*) and one with the TOP?
You shouldn't use a TOP without an ORDER clause. How do you define TOP?
@DeCaf good point but it would look quiet redundant don't you think? especially if the select grows eventually to multiple joins or conditionals. But yes that would give me what I want on the expense of redundant code.
@NickVaccaro good observation, in this case I just want a random 5, ordering doesn't mean much for the app.

Nick Vaccaro · Answer 1 · 2013-04-23 18:03:06Z

up vote 1 down vote

Assuming you want the top 5 ordering by id ascending, this will do it with a single pass through your table.

; WITH Everything AS 
(
    SELECT id
        , a
        , b
        , c
        , ROW_NUMBER() OVER (ORDER BY id ASC) AS rn_asc
        , ROW_NUMBER() OVER (ORDER BY id DESC) AS rn_desc
    FROM <table>
)
SELECT id
    , a
    , b
    , c
    , rn_asc + rn_desc - 1 AS total_rows
FROM Everything
WHERE rn_asc <= 5

answered 2 days ago

Nick Vaccaro
2,9531029

You can do this more efficiently and less verbosely by using COUNT(*) OVER() – Aaron Bertrand 2 days ago

Thanks, but the result is similar to what I currently get (with the total rows showing up in all 5 rows). – sOltan 2 days ago

1

@sOltan well, where do you WANT the total rows showing up? As a different resultset? If so, then just run a separate query! – Aaron Bertrand 2 days ago

@Aaron I agree. – sOltan 2 days ago

Aaron Bertrand · Answer 2 · 2013-04-23 18:33:21Z

up vote 2 down vote

Here's a relatively efficient way to get 5 random rows and include the total count. The random element will introduce a full sort no matter where you put it.

SELECT TOP (5) id,a,b,c,total = COUNT(*) OVER()
  FROM dbo.mytable
  ORDER BY NEWID();

answered 2 days ago

Aaron Bertrand
75.6k94281

This is nice! but it still includes the count in every row as a column, and somehow when I tried it generated a lot of logical reads on an object called 'worktable'. See my update to the original post, I am leaning towards separating the count from the top 5. Thanks – sOltan 2 days ago

1

@sOltan And did these extra reads actually lead to an observable performance problem? You seem very confused about your goal. I think that you need to decide if you care whether the count is included in every row of the result or not. There is no easy way to include it only once and also avoid worktable stuff (which I suspect is due to the random order, NOT the count - maybe you should separate those two requirements as well). – Aaron Bertrand 2 days ago

there was no observable performance problem and your query is awesome, I've made a note of it. Right now, I'm returning the row count as an out parameter, I've posted an update below. Thank you so much for your help and would appreciate any feedback you have on my update. – sOltan 2 days ago

sOltan · Answer 3 · 2013-04-23 18:49:02Z

up vote 0 down vote

** Update **

This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant. Let me know what you think. Thanks!

-- @count is passed in as an out param to the stored procedure

CREATE TABLE dbo.#everything (id int, a int, b int, c int);

INSERT INTO #everything 
SELECT id, a, b, c FROM table WHERE a < 2;      

SET @count = @@rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;

answered 2 days ago

sOltan
3419

Why do you need the #temp table? Do you think inserting into a new object and obtaining @@ROWCOUNT is cheaper than just SELECT COUNT(*) from the base table? Also, again, TOP without ORDER BY does not give you random, if that's what you're after. There is a big difference between RANDOM and UNDEFINED. – Aaron Bertrand 2 days ago

Personally, I don't like output params. I think they're too easy to miss, and that returning a second table with a single value is cleaner. However, that's my opinion, coming from someone who considers what the application layer looks like. I'd prefer to see the C# (or whatever) handling 2 data tables in a set. Best of luck either way! – Nick Vaccaro 2 days ago

1

"no additional overhead" you mean aside from scanning the entire table and inserting it into a #temp table? Why do you think this doesn't involve any overhead? – Aaron Bertrand 2 days ago

2

It sounds like you've got enough suggestions to get this done. None of these will be significantly better or worse than any other, so it's all up to you. I would suggest choosing your answer, then closing this question. If you have more questions, present them separately. – Nick Vaccaro 2 days ago

2

Ok, so scanning the entire range of the index that matches the WHERE clause (which wasn't in your original question, by the way!). Why do you think a #temp table is "in memory"? I think you have a pretty skewed understanding of what "not any more efficient" and "less efficient" mean. – Aaron Bertrand 2 days ago

show 6 more comments

asked	2 days ago
viewed	63 times
active	2 days ago

T-SQL get row count before TOP is applied

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged sql-server tsql select or ask your own question.

Community Bulletin

T-SQL get row count before TOP is applied

3 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql-server tsql select or ask your own question.

Community Bulletin

Related