Take the 2-minute tour ×
Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

So I basically have a table with lots of inserts, setting one of the fields (uploaded_at) to NULL. Then a periodic task selects all the tuples WHERE uploaded_at IS NULL, processes them and updates, setting uploaded_at to current date.

How should I index the table?

I understand that I should use a partial index like CREATE INDEX foo ON table (uploaded_at) WHERE uploaded_at IS NULL or smth like that. I'm a bit confused though if it is correct to index on a field that is always NULL. Or if it is correct to use a b-tree index. Hash looks like a better idea, but it is obsolete and is not replicated via streaming hot-standby replication. Any advice would be greatly appreciated.

EDIT:

I've experimented a bit with the following indices:

"foo_part" btree (uploaded_at) WHERE uploaded_at IS NULL "foo_part_id" btree (id) WHERE uploaded_at IS NULL and the query analyser seem to always choose the foo_part index. explain analyse also yields slightly better result for an index foo_part index:

Index Scan using foo_part on t1  (cost=0.28..297.25 rows=4433 width=16) (actual time=0.025..3.649 rows=4351 loops=1)
   Index Cond: (uploaded_at IS NULL)
 Total runtime: 4.060 ms

vs

Bitmap Heap Scan on t1  (cost=79.15..6722.83 rows=4433 width=16) (actual time=1.032..4.717 rows=4351 loops=1)
   Recheck Cond: (uploaded_at IS NULL)
   ->  Bitmap Index Scan on foo_part_id  (cost=0.00..78.04 rows=4433 width=0) (actual time=0.649..0.649 rows=4351 loops=1)
 Total runtime: 5.131 ms
share|improve this question

1 Answer 1

up vote 4 down vote accepted

In this special case the column actually indexed is irrelevant for the query at hand. You can pick any column. I would pick something else than uploaded_at, which is useless. Some column that may be useful for other queries and is not bigger than 8 bytes ideally.

CREATE INDEX foo ON table bar (some_col) WHERE uploaded_at IS NULL;

If you have no use case for any other column, it's still best to stick with the useless uploaded_at, so not to introduce additional maintenance cost for the index and some restrictions for H.O.T. UPDATE. More:

share|improve this answer
    
Could it maybe be a serial id field for example? –  teferi Oct 31 '14 at 19:47
    
@teferi: a serial is as good as any. The point is whether there are actually queries to make use of it. –  Erwin Brandstetter Nov 1 '14 at 0:53

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.