We have a table that has approximately one billion records per month. Considering 18 months of history we are talking about 18 billion records.
This table is partitioned weekly by date (so we have around 74 partitions).
For one of our queries we need to get the last 1000 records of one given unit. Something like this
SELECT code, obs_time
FROM unit_position
WHERE unit_id = 1
ORDER BY obs_time DESC LIMIT 1000;
The problem is that for this we have the following result in explain:
Limit (cost=96181.06..96181.09 rows=10 width=12)
-> Sort (cost=96181.06..102157.96 rows=2390760 width=12)
Sort Key: unit_position .obs_time -> Result (cost=0.00..44517.60 rows=2390760 width=12) -> Append (cost=0.00..44517.60 rows=2390760 width=12) -> Seq Scan on unit_position (cost=0.00..42336.00 rows=2273600 width=12) -> Seq Scan on unit_position_week350 unit_position (cost=0.00..21.60 rows=1160 width=12) -> ... (ALL OTHER PARTITIONS) ... -> Seq Scan on unit_position_week450 unit_position (cost=0.00..21.60 rows=1160 width=12)
In the other hand if we get a query like this (limiting the query to the first interval where we can get the 1000 records) we can >2x faster results:
SELECT fake, obs_time
FROM unit_position
WHERE unit_id = 1
AND obs_time >= NOW() - '7 weeks'::interval
ORDER BY obs_time DESC LIMIT 1000;
The question is, considering that we are ordering by obs_time, is there a way to make the query use the partitions and only search the first n partitions needed?
In most cases the results will be in the most recent 4 partitions (so it would only search those 4 partitions) and only in some very few it would have to search all the partitions.
If after getting n partitions (by order) it finds the 1000 results it won't consider the rest of the partitions (billions of records discarded). The tests/explain show that PostgreSQL is not doing this. It is actually going for all partitions (if it doesn't get WHERE state limiting the QUERY to the constrains of the PARTITIONS. Is there a way to force this? (for example in ORACLE one can give suggestion to the DB engine on how to perform some queries, even though I also don't know if do this for partitions)
The overhead of doing each on of the partitions manually (giving the intervals) get us worst results (and doing these we could actually be working without partitions, it would be better to have different tables).
Any other suggestions?