5

In my query I want find rows that match one of many LIKE operators. I know 3 ways of doing that but only one of them can use index.

Lets start with table:

CREATE TABLE dir (
    id BIGSERIAL PRIMARY KEY,
    path TEXT NOT NULL
);

CREATE INDEX path_idx ON dir(path TEXT_pattern_ops);

After inserting sample data I can do:

EXPLAIN ANALYZE 
SELECT id, path FROM dir
   WHERE path LIKE 'A%'
      OR path LIKE 'B%'
      OR path LIKE 'C%';

Above query use index correctly.

Second way:

EXPLAIN ANALYZE 
SELECT id, path FROM dir
  WHERE path LIKE ANY(ARRAY['A%', 'B%', 'C%']::TEXT[]);

This query will not use index. Last approach I know:

CREATE TABLE patterns (pattern) AS VALUES
('A%'),
('B%'),
('C%');

EXPLAIN ANALYZE 
SELECT id, path FROM dir
  JOIN patterns ON (dir.path LIKE patterns.pattern);

This query like the previous one will not use index.

Here is SQL Fiddle for those how want to play with those queries: http://sqlfiddle.com/#!17/24031/2

Problem: Query with path LIKE X OR path LIKE Y is completely unreadable for many patterns (number of patterns may vary from few to hundreds or small thousands) and I am afraid that large query may be slow to parse or even to hit 1GB limit of query length (some patterns may have very long prefixes).

Question: is there any oder method returning same results that would not require putting all patterns directly in query (like in this option with join)?

6
  • I'm using Postgres 9.6, but can upgrade if necessary. Commented May 2, 2018 at 11:22
  • Regarding the prefixes, may we ask about the business problem you have which would require such long prefixes? For instance, if a prefix be 1000 characters, then why couldn't it just be 500 characters, and have it do the same thing? Commented May 2, 2018 at 11:26
  • In my original problem dir table contains paths to directories on filesystem. With this query I want to find all subdirectries but only for some dirs satisfying given criteria. So what I have to do right now is to create query to get those dirs from DB, create SQL with SqlAlchemy: path LIKE X for each founded dir. It looks like unnecessary steps to me and the only thing that blocks me from changing that is that all other approaches do not use index. Commented May 2, 2018 at 11:36
  • 1
    I would look into ltree. it would take additional effort to "translate" path to ltree, but you would have cosy operators and index support
    – Vao Tsun
    Commented May 2, 2018 at 11:43
  • Unfortunately ltree has very limited character set. In my case path column can contain any UTF-8 character. Commented May 2, 2018 at 11:53

1 Answer 1

2

You can create a trigram index that will support your query.

For that you need the pg_trgm extension; run the following as superuser:

CREATE EXTENSION pg_trgm;

Then you can create a GIN index:

CREATE INDEX ON dir USING gin (path gin_trgm_ops);

This index can be used with your second and third approach, so it should do the trick for you.

With short patterns like the one in your examples, the index will not be very effective.

You can also use a GiST index, that will probably be smaller, but slower to search.

Note that you can use that index also with patterns that start with %.

8
  • 1
    I belive OP is asking if he can make query analyzer to use index when pattern comes implicitly from joined table. like - if you put array to a table and join array_table on path LIKE ANY(pattern_array); seq scan appears. so he asks if you can hide pattern list in query and use index. If I understood OP right of course :)
    – Vao Tsun
    Commented May 2, 2018 at 12:16
  • True; I have rephrased the answer (the index can be used in that case too). Commented May 2, 2018 at 12:32
  • 1
    thank you. I would appreciate the comment, why join on any(array) works and join on like pattern does not. If you would have a minute for that of course. I'm sure many people will find it extremely interesting, in particular me
    – Vao Tsun
    Commented May 2, 2018 at 12:42
  • I am really surprised that this works. Why is there a difference which index type is created? Why in my case (btree index) planner is not choosing to use index and with trgm index it uses it. It is completely counter intuitive to me. Commented May 2, 2018 at 12:58
  • I believe this is correct answer for my question. Unfortunately my previous experience with GIST index was very poor - it turned out that during autovacuum index is traversed in a logical order which translates into random disk accesses which makes it unusable in my case (index is almost 1TB large). Do you know it it also apply to GIN index? Commented May 2, 2018 at 13:07

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.