Indexing array of strings column type in PostgreSQL

Question

Is it possible to create an index on a column with type of text array. Tried using GIN indexes, but queries do not seem to be using those indexes.

-- Example:
CREATE TABLE users (
   name VARCHAR(100),
   groups TEXT[],
);

-- Query: 
SELECT name FROM users WHERE ANY(groups) = 'Engineering';

Also what is the best way to perform GROUP BY on groups column efficiently so that it can give groups and count.

Actual table has got multiple multi-valued columns. Db is relatively normalized expect for these columns where I wanted to avoid excessive joins as it is slowing down the performance. The 'User' table I used is just an example. — Anoop, Feb 9, 2011 at 15:41

Frank Heikens · Accepted Answer · 2011-02-09 15:27:30Z

7

A gin index can be used:

CREATE TABLE users (
 name VARCHAR(100),
 groups text[]
);

CREATE INDEX idx_users ON users USING GIN(groups);

-- disable sequential scan in this test:
SET enable_seqscan TO off;

EXPLAIN ANALYZE
SELECT name FROM users WHERE  groups @> (ARRAY['Engineering']);

Result:

"Bitmap Heap Scan on users  (cost=4.26..8.27 rows=1 width=218) (actual time=0.021..0.021 rows=0 loops=1)"
"  Recheck Cond: (groups @> '{Engineering}'::text[])"
"  ->  Bitmap Index Scan on idx_users  (cost=0.00..4.26 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)"
"        Index Cond: (groups @> '{Engineering}'::text[])"
"Total runtime: 0.074 ms"

Using aggregate functions on an array, that will be another problem. The function unnest() might help.

Why don't you normalize your data? That will fix all problems, including many problems you didn't encouter yet.

answered Feb 9, 2011 at 15:27

Frank Heikens

120k25 gold badges143 silver badges135 bronze badges

I think I didn't use ARRAY['Engineering'] in the query and as a result GIN index never got used. Regarding normalization, actual table and use case is different from what I have mentioned. Actual table do have multiple columns which are multivalued represented using array of strings. I was trying to avoid multiple joins as the expected number of records per table is of the order of millions.
– Anoop
Feb 9, 2011 at 15:35
1

I think the operator @> makes all the difference, it looks like ANY() can't use the index.
– Frank Heikens
Feb 9, 2011 at 15:39
Specifically when you expect large intermediate and/or result tables you should make your data as accessible to the DBMS as you can. Normalization allows it to work with a reference in several cases where otherwise you'd work with values.
– Simon Richter
Feb 9, 2011 at 15:46
Using explain I see no difference between operator @> en any(), both give identically: explain select count(*) from kwetsbaar.kwo where bijztype @> (array['WON'])::varchar[]; Finalize Aggregate (cost=505166.56..505166.57 rows=1 width=8) -> Gather (cost=505166.14..505166.55 rows=4 width=8) Workers Planned: 4 -> Partial Aggregate (cost=504166.14..504166.15 rows=1 width=8) -> Parallel Seq Scan on kwo (cost=0.00..500662.60 rows=1401414 width=0) Filter: (bijztype @> '{WON}'::character varying[]) with GIN index, 8 million rows.
– Jan
Feb 3, 2022 at 15:53

Add a comment |

Simon Richter · Accepted Answer · 2011-02-09 15:29:53Z

I think the best way to handle this would be to normalize your model. The following will probably contain errors as I didn't try it, but the idea should be clear:

CREATE TABLE users (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE groups (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE user_group (
    user INTEGER NOT NULL REFERENCES users,
    group INTEGER NOT NULL REFERENCES groups);
CREATE UNIQUE INDEX user_group_unique ON user_group (user, group);

SELECT users.name
    FROM user_group
    INNER JOIN users ON user_group.user = users.id
    INNER JOIN groups ON user_group.group = groups.id
    WHERE groups.name = 'Engineering';

The resulting execution plan should be fairly efficient already; you can optimize still by indexing ON user_group(group), which allows an index_scan rather than a sequential_scan to find the members of a particular group.

I wanted to avoid normalization for this particular use case. That is the reason why I was forced to use array of strings data type. — Anoop, Feb 9, 2011 at 15:32

Kvetoslav · Accepted Answer · 2023-02-13 21:31:01Z

0

There is array_to_tsvector() function as of PostgreSQL 9.6 now.

https://pgpedia.info/a/array_to_tsvector.html

So maybe quite efficient is to create tsvectors and use gin index on them.

answered Feb 13 at 21:31

Kvetoslav

4734 silver badges16 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Indexing array of strings column type in PostgreSQL

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
arrays
database
postgresql
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged arraysdatabasepostgresql or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
arrays
database
postgresql
or ask your own question.