Postgres: Optimizing querying by datetime

Question

I have a table that has a datetime field "updated_at". A lot of my queries will be querying on this field using range queries such as rows that have updated_at > a certain date.

I already added an index to updated_at, but most of my queries are still very slow, even when I had a limit to the number of rows return.

What else can I do to optimize queries that query on datetime fields?

Can you post the explain plan, total number of rows and the exact value of "very slow"? — Jakub Kania, May 19 '13 at 21:13
Please read stackoverflow.com/tags/postgresql-performance/info (and the linked SlowQueryQuestions wiki page) then update your question with suitable explain analyze results and report back. Since you're using a query generator you might need to use auto_explain or to log queries and re-execute them by hand. — Craig Ringer, May 19 '13 at 23:43
Please post the schema and the type of query that is slow. The question as it is phrased cannot reasonably be answered... — Denis, May 20 '13 at 5:45

Bohemian · Answer 1 · 2013-05-19 22:07:30Z

up vote 0 down vote

Usually database optimizers won't chose to use indexes for open-ended ranges, such a updated_at > somedate.

But, in many cases the datatime column won't exceed "now", so you can preserve the semantic of > somedate by converting the condition to a range by using between like this:

where updated_at between somedate and current_timestamp

A between predicate is much more likely to cause the optimizer to chose to use an index.

Please post of this approach improved your quarry's performance.

edited May 19 '13 at 22:07

answered May 19 '13 at 21:10

Bohemian♦
140k22138238

2

Is this really true of PostgreSQL? I would think that the optimiser would look at the range of values in the relevant column, via pg_statistics, and produce an estimated cardinality of the result set for the predicate. If the maximum value is less than or equal to current_timestamp then I wouldn't think that there would be much difference. Be interesting for Henley to test though -- the explain plan would reveal all. – David Aldridge May 19 '13 at 21:31

@DavidAldridge in my experience, > is just not optimized well. I also like Harvey to post results. – Bohemian♦ May 19 '13 at 22:04

3

Postgres will use an index for > if it's useful. No need to for a between: See here for an example sqlfiddle.com/#!12/e3142/3 It all depends - as usual with an index - whether or not the cost of using an index is less than something else – a_horse_with_no_name May 19 '13 at 22:12

> is optimized just fine in Postgres and uses BTree indexes when it's appropriate to do so based on the table stats. – Denis May 20 '13 at 5:40

@a_horse_with_no_name nice fiddle. I'm sold. Still I'll leave the answer until OP posts results – Bohemian♦ May 20 '13 at 6:39

add a comment |

David Aldridge · Answer 2 · 2013-05-20 06:34:19Z

Assuming that the index is being used but performance is still poor, the only remedy I can think of is to cluster the table by that index: http://www.postgresql.org/docs/9.1/static/sql-cluster.html

This will move the rows with the same update_at value to be co-located physically, improving performance of a query that access that table via the index, particularly for large range scans.

Pay attention to the warnings in the documentation though, and note that as rows are updated the clustering is not preserved.

Also:

When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.

Based on these restrictions it may not be a viable solution for your case, but might be useful to others.

dmg · Answer 3 · 2013-05-20 18:16:03Z

For any given query, the use of an index depends on the cost of using that index compared to a sequential scan

Frequently developers think that because there is an index, a query should run faster, and if a query runs slow, an index is the solution. This is usually the case when the query will return few tuples. But as the number of tuples in the result increases, the cost of using an index might increase.

You are using postgres. Postgres does not support clustering around a given attribute. That means that postgres, when confronted with a range query (of the type att > a and att < b) needs to compute an estimation of the number of tuples in the result (make sure you vacuum your database frequently) and the cost of using an index compared to doing a sequential scan. it will then decide what method to use.

you can inspect this decision by running

EXPLAIN ANALYZE <query>;

in psql. It will tell you if it uses an index or not.

If you really, really want to use the indexes instead of a sequential scan (sometimes it is needed) and you really really know what you are doing, you can change the cost of a sequential scan in the planner constants or disable sequential scans in favor of any other method. See this page for the details:

http://www.postgresql.org/docs/9.1/static/runtime-config-query.html

Make sure you browse the correct version of the documentation.

--dmg

asked	1 year ago
viewed	295 times
active	1 year ago

current community

your communities

more stack exchange communities

Postgres: Optimizing querying by datetime

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged sql performance postgresql postgresql-performance or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Postgres: Optimizing querying by datetime

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql performance postgresql postgresql-performance or ask your own question.

Related

Hot Network Questions