Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

We have a table with an indexed array column:

CREATE TABLE mention (
  id SERIAL,
  phraseIds integer[],
  PRIMARY KEY (id)
);
CREATE INDEX indx_mentions_phraseIds on mention USING GIN (phraseids public.gin__int_ops);

Queries using the "overlaps" operator on this column don't seem to use the index:

explain analyze select m.id FROM mention m WHERE m.phraseIds && ARRAY[11638,11639];

Seq Scan on mention m  (cost=0.00..933723.44 rows=1404 width=4) (actual time=103.018..3751.525 rows=1101 loops=1)
Filter: (phraseids && '{11638,11639}'::integer[])
Rows Removed by Filter: 7019974
Total runtime: 3751.618 ms

Is it possible to get Postgresql to use the index? Or should we be doing something else?

Update: I repeated the test with 'SET enable_seqscan TO off' and the index is still not used.

Update: I should have mentioned that I am using 9.2 with the intarray extension.

Update: It seems that the intarray extension is part of this problem. I re-created the table without using the intarray extension and the index is used as expected. Anyone know how to get the index to be used with the intarray extension? The docs (http://www.postgresql.org/docs/9.2/static/intarray.html) say that indexes are supported for &&.

share|improve this question
1  
Use SET enable_seqscan TO off and see first if PG can use the index, it should, and then if the index scan will be faster than the seq scan. Post the result here, please. –  MatheusOl Aug 15 '13 at 19:15
 
Why is storing an array of id numbers in each row the best design for your table? –  Mike Sherrill 'Catcall' Aug 15 '13 at 21:47
 
The data was previously in MySQL and we used a number of separate columns (phrase0, phrase1, ...) to hold the data. Using a separate table was very slow on MySQL (many millions of rows in both tables and we need to sort the result and do limit). Using a Postgres array seemed like a good thing to do. –  David Tinker Aug 16 '13 at 5:20
add comment

1 Answer

up vote 2 down vote accepted

I built a similar table in PostgreSQL 9.2; the difference was USING GIN (phraseids); I don't seem to have int_ops available in this context for some reason. I loaded a few thousand rows of random (ish) data.

Setting enable_seqscan off let PostgreSQL use the index.

PostgreSQL calculated the cost of a sequential scan to be less than the cost of a bitmap heap scan. The actual time of a sequential scan was 10% the actual time of a bitmap heap scan, but the total run time for a sequential scan was a little more than the total run time of a bitmap heap scan.

share|improve this answer
 
Ok. Not using the intarray extension on the index for this column sorts it out. Thats what we are going to do for now. Tx. –  David Tinker Aug 16 '13 at 6:23
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.