Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a SELECT that can return hundreds of rows from a table (table can be ~50000 rows). My app is interested in knowing the number of rows returned, it means something important to me, but it actually uses only the top 5 of those hundreds of rows. What I want to do is limit the SELECT query to return only 5 rows, but also tell my app how many it would have returned (the hundreds). This is the original query:

SELECT id, a, b, c FROM table WHERE a < 2

Here is what I came up with - a CTE - but I don't feel comfortable with the total row count appearing in every column. Ideally I would want a result set of the TOP 5 and a returned parameter for the total row count.

WITH Everything AS
(
   SELECT id, a, b, c FROM table
),
DetermineCount AS
(
   SELECT COUNT(*) AS Total FROM Everything 
)
SELECT TOP (5) id, a, b, c, Total
FROM Everything 
CROSS JOIN DetermineCount;

Can you think of a better way?

Is there a way in T-SQl to return the affected row count of a select top query before the top was applied? @@rowcount would return 5 but I wonder if there is a @@rowcountbeforetop sort of thing.

Thanks in advance for your help.

** Update **

This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant.

-- @count is passed in as an out param to the stored procedure

CREATE TABLE dbo.#everything (id int, a int, b int, c int);

INSERT INTO #everything 
SELECT id, a, b, c FROM table WHERE a < 2;      

SET @count = @@rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything; 
share|improve this question
1  
What's wrong with executing two queries? One with the SELECT COUNT(*) and one with the TOP? – DeCaf 2 days ago
1  
You shouldn't use a TOP without an ORDER clause. How do you define TOP? – Nick Vaccaro 2 days ago
@DeCaf good point but it would look quiet redundant don't you think? especially if the select grows eventually to multiple joins or conditionals. But yes that would give me what I want on the expense of redundant code. – sOltan 2 days ago
@NickVaccaro good observation, in this case I just want a random 5, ordering doesn't mean much for the app. – sOltan 2 days ago
1  
Well TOP without ORDER BY won't give you "random"... – Aaron Bertrand 2 days ago

3 Answers

Assuming you want the top 5 ordering by id ascending, this will do it with a single pass through your table.

; WITH Everything AS 
(
    SELECT id
        , a
        , b
        , c
        , ROW_NUMBER() OVER (ORDER BY id ASC) AS rn_asc
        , ROW_NUMBER() OVER (ORDER BY id DESC) AS rn_desc
    FROM <table>
)
SELECT id
    , a
    , b
    , c
    , rn_asc + rn_desc - 1 AS total_rows
FROM Everything
WHERE rn_asc <= 5
share|improve this answer
You can do this more efficiently and less verbosely by using COUNT(*) OVER() – Aaron Bertrand 2 days ago
Thanks, but the result is similar to what I currently get (with the total rows showing up in all 5 rows). – sOltan 2 days ago
1  
@sOltan well, where do you WANT the total rows showing up? As a different resultset? If so, then just run a separate query! – Aaron Bertrand 2 days ago
@Aaron I agree. – sOltan 2 days ago

Here's a relatively efficient way to get 5 random rows and include the total count. The random element will introduce a full sort no matter where you put it.

SELECT TOP (5) id,a,b,c,total = COUNT(*) OVER()
  FROM dbo.mytable
  ORDER BY NEWID();
share|improve this answer
This is nice! but it still includes the count in every row as a column, and somehow when I tried it generated a lot of logical reads on an object called 'worktable'. See my update to the original post, I am leaning towards separating the count from the top 5. Thanks – sOltan 2 days ago
1  
@sOltan And did these extra reads actually lead to an observable performance problem? You seem very confused about your goal. I think that you need to decide if you care whether the count is included in every row of the result or not. There is no easy way to include it only once and also avoid worktable stuff (which I suspect is due to the random order, NOT the count - maybe you should separate those two requirements as well). – Aaron Bertrand 2 days ago
there was no observable performance problem and your query is awesome, I've made a note of it. Right now, I'm returning the row count as an out parameter, I've posted an update below. Thank you so much for your help and would appreciate any feedback you have on my update. – sOltan 2 days ago

** Update **

This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant. Let me know what you think. Thanks!

-- @count is passed in as an out param to the stored procedure

CREATE TABLE dbo.#everything (id int, a int, b int, c int);

INSERT INTO #everything 
SELECT id, a, b, c FROM table WHERE a < 2;      

SET @count = @@rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything; 
share|improve this answer
Why do you need the #temp table? Do you think inserting into a new object and obtaining @@ROWCOUNT is cheaper than just SELECT COUNT(*) from the base table? Also, again, TOP without ORDER BY does not give you random, if that's what you're after. There is a big difference between RANDOM and UNDEFINED. – Aaron Bertrand 2 days ago
Personally, I don't like output params. I think they're too easy to miss, and that returning a second table with a single value is cleaner. However, that's my opinion, coming from someone who considers what the application layer looks like. I'd prefer to see the C# (or whatever) handling 2 data tables in a set. Best of luck either way! – Nick Vaccaro 2 days ago
1  
"no additional overhead" you mean aside from scanning the entire table and inserting it into a #temp table? Why do you think this doesn't involve any overhead? – Aaron Bertrand 2 days ago
2  
It sounds like you've got enough suggestions to get this done. None of these will be significantly better or worse than any other, so it's all up to you. I would suggest choosing your answer, then closing this question. If you have more questions, present them separately. – Nick Vaccaro 2 days ago
2  
Ok, so scanning the entire range of the index that matches the WHERE clause (which wasn't in your original question, by the way!). Why do you think a #temp table is "in memory"? I think you have a pretty skewed understanding of what "not any more efficient" and "less efficient" mean. – Aaron Bertrand 2 days ago
show 6 more comments

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.