Optimize Postgres timestamp query range

Question

I have the following table and indices defined:

CREATE TABLE ticket
(
  wid bigint NOT NULL DEFAULT nextval('tickets_id_seq'::regclass),
  eid bigint,
  created timestamp with time zone NOT NULL DEFAULT now(),
  status integer NOT NULL DEFAULT 0,
  argsxml text,
  moduleid character varying(255),
  source_id bigint,
  file_type_id bigint,
  file_name character varying(255),
  status_reason character varying(255),
  ...
)

I created an index on the created timestamp as follows:

CREATE INDEX ticket_1_idx
  ON ticket
  USING btree
  (created );

and here's my query

select * from ticket 
where created between '2012-12-19 00:00:00' and  '2012-12-20 00:00:00'

This was working fine until the number of records started to grow (about 5 million) and now it's taking forever to return.

Explain analyze reveals this:

"Index Scan using ticket_1_idx on ticket  (cost=0.00..10202.64 rows=52543 width=1297) (actual time=0.109..125.704 rows=53340 loops=1)"
"  Index Cond: ((created >= '2012-12-19 00:00:00+00'::timestamp with time zone) AND (created <= '2012-12-20 00:00:00+00'::timestamp with time zone))"
"Total runtime: 175.853 ms"

So far I've tried setting

random_page_cost = 1.75 
effective_cache_size = 3

Also created

create CLUSTER ticket USING ticket_1_idx;

Nothing works. What am I doing wrong? Why is it selecting sequential scan? The indexes are supposed to make the query fast. Anything that can be done to optimize it?

Also it's only taking 175ms to run. If it's taking forever, it's likely the OP has a huge dataset that's taking a long time to transfer across the wire, not time to run the query.\ — Scott Marlowe, Dec 22 '12 at 5:17
BTW: effective_cache_size=3 might be a bit too low. (but probably won't harm in this case) — wildplasser, Dec 22 '12 at 13:14
Don't do a select * as it will increase the result set size to be transferred to the client. — Clodoaldo Neto, Dec 27 '12 at 5:43

Erwin Brandstetter · Answer 1 · 2014-02-07 13:38:40Z

`CLUSTER`

If you intend to use CLUSTER, the displayed syntax is invalid.

~~create CLUSTER ticket USING ticket_1_idx;~~

Run once:

CLUSTER ticket USING ticket_1_idx;

This can help a lot with bigger result sets. Not so much for a single row returned. If you table isn't read-only you need to re-run

CLUSTER ticket;

at certain intervals. Possibly only on volatile partitions. See below.

`pg_repack`

CLUSTER takes an exclusive lock on the table, which may be a problem in a multi-user environment. Quoting the manual:

When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.

Bold emphasis mine. Consider the alternative pg_repack:

Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

and:

pg_repack needs to take an exclusive lock at the end of the reorganization.

Version 1.2 works with

PostgreSQL 8.3, 8.4, 9.0, 9.1, 9.2, 9.3

Query

The query is simple enough not to cause any performance problems per se.

However, a word on correctness: The BETWEEN construct includes borders. Your query selects all of Dec. 19, plus records from Dec. 20, 0:0 hours. That's an extremely unlikely requirement. Chances are, you really want this:

SELECT *
FROM   ticket 
WHERE  created >= '2012-12-19 0:0'
AND    created <  '2012-12-20 0:0';

Performance

First off, you ask:

Why is it selecting sequential scan?

Well, your EXPLAIN output clearly shows an Index Scan, not a sequential table scan. There must be some kind of misunderstanding.

If you are pressed hard for better performance, you may be able to improve things. But the necessary background information is not in the question.

You could only query required columns instead of * to reduce transfer cost.
You could look at partitioning and put practical time slices into separate tables. Add indexes to partitions as needed.
If partitioning is not an option, another related but less intrusive technique would be to add one or more partial indexes.
For example, if you mostly query the current month, you could create the following partial index:
```
CREATE INDEX ticket_created_idx ON ticket(created)
WHERE created >= '2012-12-01 00:00:00'::timestamp;
```
DROP / CREATE the index with the start of a new month. You can easily automate the task with a cron job.
Keep the total index in addition for CLUSTER (which cannot operate on partial indexes). If old records never change, table partitioning would help this task a lot, since you only need to re-cluster newer partitions.

If you combine the last two steps, performance should be rather awesome.

Performance Basics

Of course, the usual performance advice applies:
https://wiki.postgresql.org/wiki/Slow_Query_Questions
https://wiki.postgresql.org/wiki/Performance_Optimization

You may be missing one of the basics.

asked	1 year ago
viewed	4733 times
active	8 months ago

current community

your communities

more stack exchange communities

Optimize Postgres timestamp query range

1 Answer 1

`CLUSTER`

`pg_repack`

Query

Performance

Performance Basics

Your Answer

Not the answer you're looking for? Browse other questions tagged postgresql indexing query-optimization database-partitioning postgresql-performance or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Optimize Postgres timestamp query range

1 Answer 1

CLUSTER

pg_repack

Query

Performance

Performance Basics

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged postgresql indexing query-optimization database-partitioning postgresql-performance or ask your own question.

Linked

Related

Hot Network Questions

`CLUSTER`

`pg_repack`