PostgreSQL filtering on array length inside JSON

Question

I have a table table with a JSONB field data, which contains a variable-length array, e.g.

{"label": "xyz", "items": [ ... ]}

I created an index on the length of the "items" element:

CREATE INDEX n_items ON table ( JSONB_ARRAY_LENGTH(data->'items') )

but when I filter, I still get a sequential scan when I try to filter on it:

EXPLAIN ANALYZE SELECT COUNT(*) FROM table WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;

                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Aggregate  (cost=2565655.67..2565655.68 rows=1 width=8)
   ->  Seq Scan on table (cost=0.00..2535256.19 rows=12159794 width=8)
         Filter: (jsonb_array_length((table.data -> 'items'::text)) = 2)
 Planning time: 0.121 ms
 Execution time: 482891.694 ms

That's about 8 minutes to filter! Did I do something wrong here, or is this the consequence of PostgreSQL not keeping statistics on JSON(B) objects? It should be possible to flatten out this data column, but I'd like to be sure that's what I need to do before I start working on it.

edit: these array lengths do not vary much. There are only 4 distinct values in the data currently, and I don't expect to have many more. Is the index just not very useful in this case, or can I improve filtering some other way?

dezso · Answer 1 · 2016-09-16 07:16:56Z

up vote 2 down vote

Without knowing your data, I can only guess that the selectivity of your index is low (which happen if the length of the array does not vary much).

One trick to overcome this might be changing the query slightly and creating a covering index. For this, choose a NOT NULL column (for example, the primary key of the table) to count, and then include this column in the index:

CREATE INDEX n_items ON your_table (jsonb_array_length(data->'items'), id);

SELECT count(id) 
  FROM your_table
 WHERE JSONB_ARRAY_LENGTH(table.data->'items') = 2;

This will hopefully turn into an index-only scan (I tested this omitting the jsonb part, but you will be able to tell if it works).

answered Sep 16 at 7:16

dezso

14.9k73764

The selectivity is in fact low. In this case does the index not actually help filtering performance on queries other than counts? – ssdecontrol Sep 16 at 14:53

@ssdecontrol Well, it depends - there is a chance it won't help much, unless the same two columns are involved. – dezso Sep 16 at 15:12

Let me rephrase: is an index on a low-selectivity column not generally useful? – ssdecontrol Sep 16 at 16:22

add a comment |

asked	2 months ago
viewed	72 times
active	2 months ago

current community

your communities

more stack exchange communities

PostgreSQL filtering on array length inside JSON

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged postgresql index json array or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

PostgreSQL filtering on array length inside JSON

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged postgresql index json array or ask your own question.

Related

Hot Network Questions