Our system write a lots of data (kind of Big Data system). The write performance is good enough for our needs but the read performance is really too slow.
The primary key (constraint) structure is similar for all our tables: timestamp(Timestamp) ; index(smallint) ; key(integer)
A table can have millions of row, even billion of rows, and a read request is usually for a specific period (timestamp / index) and tag. It's common to have a query that return around 200k lines. Currently, we can read about 15k lines per second but we need to be 10 times faster. Is this possible and if so, how?
Note: PostgreSQL is packaged with our software, so the hardware is different from one client to another.
[Edit] Added details below, performance was better for this test because I don't have access to the real setup right now. I will update as soon as I can access the setup.
[Edit2] Applied "dezso" suggestions, see configuration changes below and the specs of the server used for testing. Yes it's a VM used for testing, the VMs host is a Server 2008 R2 x64 with 24.0 GB of ram.
Server Spec (Virtual Machine VMWare)
Server 2008 R2 x64
2.00 GB of memory
Intel Xeon W3520 @ 2.67GHz (2 cores)
postgresql.conf optimisations
shared_buffers = 512MB (default: 32MB)
effective_cache_size = 1024MB (default: 128MB)
checkpoint_segment = 32 (default: 3)
checkpoint_completion_target = 0.9 (default: 0.5)
default_statistics_target = 1000 (default: 100)
work_mem = 100MB (default: 1MB)
maintainance_work_mem = 256MB (default: 16MB)
Table Definition
CREATE TABLE "AnalogTransition"
(
"KeyTag" integer NOT NULL,
"Timestamp" timestamp with time zone NOT NULL,
"TimestampQuality" smallint,
"TimestampIndex" smallint NOT NULL,
"Value" numeric,
"Quality" boolean,
"QualityFlags" smallint,
"UpdateTimestamp" timestamp without time zone, -- (UTC)
CONSTRAINT "PK_AnalogTransition" PRIMARY KEY ("Timestamp" , "TimestampIndex" , "KeyTag" ),
CONSTRAINT "FK_AnalogTransition_Tag" FOREIGN KEY ("KeyTag")
REFERENCES "Tag" ("Key") MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE,
autovacuum_enabled=true
);
Query
The query take about 30 seconds to execute in pgAdmin3, but we would like to have the same result under 5 seconds if possible.
SELECT
"AnalogTransition"."KeyTag",
"AnalogTransition"."Timestamp" AT TIME ZONE 'UTC',
"AnalogTransition"."TimestampQuality",
"AnalogTransition"."TimestampIndex",
"AnalogTransition"."Value",
"AnalogTransition"."Quality",
"AnalogTransition"."QualityFlags",
"AnalogTransition"."UpdateTimestamp"
FROM "AnalogTransition"
WHERE "AnalogTransition"."Timestamp" >= '2013-05-16 00:00:00.000' AND "AnalogTransition"."Timestamp" <= '2013-05-17 00:00:00.00' AND ("AnalogTransition"."KeyTag" = 56 OR "AnalogTransition"."KeyTag" = 57 OR "AnalogTransition"."KeyTag" = 58 OR "AnalogTransition"."KeyTag" = 59 OR "AnalogTransition"."KeyTag" = 60)
ORDER BY "AnalogTransition"."Timestamp" DESC, "AnalogTransition"."TimestampIndex" DESC
LIMIT 500000;
Explain (Edit2: Updated)
"Limit (cost=0.00..125668.31 rows=500000 width=33) (actual time=2.193..3241.319 rows=500000 loops=1)"
" Buffers: shared hit=190147"
" -> Index Scan Backward using "PK_AnalogTransition" on "AnalogTransition" (cost=0.00..389244.53 rows=1548698 width=33) (actual time=2.187..1893.283 rows=500000 loops=1)"
" Index Cond: (("Timestamp" >= '2013-05-16 01:00:00-04'::timestamp with time zone) AND ("Timestamp" <= '2013-05-16 15:00:00-04'::timestamp with time zone))"
" Filter: (("KeyTag" = 56) OR ("KeyTag" = 57) OR ("KeyTag" = 58) OR ("KeyTag" = 59) OR ("KeyTag" = 60))"
" Buffers: shared hit=190147"
"Total runtime: 3863.028 ms"
In my latest test, It took 7 minutes to select my data!!! See below
Explain (Edit3)
"Limit (cost=0.00..313554.08 rows=250001 width=35) (actual time=0.040..410721.033 rows=250001 loops=1)"
" -> Index Scan using "PK_AnalogTransition" on "AnalogTransition" (cost=0.00..971400.46 rows=774511 width=35) (actual time=0.037..410088.960 rows=250001 loops=1)"
" Index Cond: (("Timestamp" >= '2013-05-22 20:00:00-04'::timestamp with time zone) AND ("Timestamp" <= '2013-05-24 20:00:00-04'::timestamp with time zone) AND ("KeyTag" = 16))"
"Total runtime: 411044.175 ms"
Thanks a lot for help!!
explain analyze
. More on posting this kind of questions: wiki.postgresql.org/wiki/Slow_Query_Questions – a_horse_with_no_name May 15 at 20:11explain (buffers, analyze)
output, etc. – Craig Ringer May 16 at 1:21Sort Method: external merge Disk: 9896kB
- this is not very good. What happens when you try to raisework_mem
to over 10 MBs? – dezso May 16 at 14:34