Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have a table table with a JSONB field data, which contains a variable-length array, e.g.

{"label": "xyz", "items": [ ... ]}

I created an index on the length of the "items" element:

CREATE INDEX n_items ON table ( JSONB_ARRAY_LENGTH(data->'items') )

but when I filter, I still get a sequential scan when I try to filter on it:

EXPLAIN ANALYZE SELECT COUNT(*) FROM table WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;

                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Aggregate  (cost=2565655.67..2565655.68 rows=1 width=8)
   ->  Seq Scan on table (cost=0.00..2535256.19 rows=12159794 width=8)
         Filter: (jsonb_array_length((table.data -> 'items'::text)) = 2)
 Planning time: 0.121 ms
 Execution time: 482891.694 ms

That's about 8 minutes to filter! Did I do something wrong here, or is this the consequence of PostgreSQL not keeping statistics on JSON(B) objects? It should be possible to flatten out this data column, but I'd like to be sure that's what I need to do before I start working on it.

edit: these array lengths do not vary much. There are only 4 distinct values in the data currently, and I don't expect to have many more. Is the index just not very useful in this case, or can I improve filtering some other way?

share|improve this question

Without knowing your data, I can only guess that the selectivity of your index is low (which happen if the length of the array does not vary much).

One trick to overcome this might be changing the query slightly and creating a covering index. For this, choose a NOT NULL column (for example, the primary key of the table) to count, and then include this column in the index:

CREATE INDEX n_items ON your_table (jsonb_array_length(data->'items'), id);

SELECT count(id) 
  FROM your_table
 WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;

This will hopefully turn into an index-only scan (I tested this omitting the jsonb part, but you will be able to tell if it works).

share|improve this answer
    
The selectivity is in fact low. In this case does the index not actually help filtering performance on queries other than counts? – ssdecontrol Sep 16 at 14:53
    
@ssdecontrol Well, it depends - there is a chance it won't help much, unless the same two columns are involved. – dezso Sep 16 at 15:12
    
Let me rephrase: is an index on a low-selectivity column not generally useful? – ssdecontrol Sep 16 at 16:22

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.