2

How does a function like SUM work? If I execute

select id,sum(a) from mytable group by id

does it sort by id and then sum over each range of equal id's? I am no planner expert, but it looks like that is what is happening, where mytable is maybe a hundred million rows with a few million distinct id's.

Or does it just keep a hash of id -> current_sum, and then at each row either increments the value of id or add a new key? Isn't that far faster and less memory hungry?

1
  • It is good to consider whether 'faster' means faster to get the last row or the first row of the result set (or something else). Commented Mar 24, 2011 at 11:17

2 Answers 2

1

SQL standards try to dictate external behavior, not internal behavior. In this particular case, a SQL implementation that conforms to (one of the many) standards is supposed to act like it does things in this order.

  1. Build a working table from all the table constructors in the FROM clause. (There's only one in your example.)

  2. In the GROUP BY clause, partition the working table into groups. Reduce each group to one row. Replace the working table with the grouped table.

  3. Resolve the expressions in the SELECT clause.

Query optimizers that follow SQL standards are free to rearrange things however they like, as long as the result is the same as if it had followed those steps.

You can find more details in the answers and comments to this SO question.

Sign up to request clarification or add additional context in comments.

Comments

0

So, I found this, http://helmingstay.blogspot.com/2009/06/postgresql-poetry-aggregate-median-with.html, which claims that it does indeed use the accumulator pattern. Hmmm.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.