I have a PostgreSQL 9.1 database that contains 10M+ rows and some text fields that need similarity and %word%
-like searches, so I decided to use trigram indexes.
Initially, I started to use GIN indexes, but now I am wondering whether I should rather use GIST.
The theory says:
- GIN index lookups are about three times faster than GiST
- GIN indexes take about three times longer to build than GiST
- GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled (see Section 58.4.1 for details)
- GIN indexes are two-to-three times larger than GiST indexes
I have tried to create both GIN
and GIST
indexes and found the following in practice (on my dataset):
- For
LIKE '%word%'
queries, GIN is about the same speed (or even 5-20% slower) as GIST for the first time (when the index for the trigrams in the query are not cached yet). - For
LIKE '%word%'
queries, GIN is about 5 times faster than GIST if the trigrams in the query were searched recently. GIST has always the same speed, regardless of cachedness. - For
% 'word'
(similarity) queries, GIN is about 5-8 times slower than GIST, depending on the cachedness of the index. - GIN is about 10% smaller than GIST. However, it seems to grow faster if there are
UPDATE
statements to the table. Unless, of course, IVACUUM FULL
often enough.
So I see some discrepancies between theory and practice:
- speed: GIST is often faster, except GIN is totally in memory (Can you ensure that in advance?), and even then only for
LIKE '%word%'
queries. - size: GIN is actually smaller. Is this typical for trigram indexes or is my case an outlier?
- May the behaviour be significantly different in 9.4?
- A last question: is it wise to use GIN if there are about 10-50k
UPDATE
statements and 5-10kINSERT
statements daily, and weeklyVACUUM
is probably reasonable (not much usage on the weekend), or will it really hurtUPDATE
andINSERT
performance?