I have a big table ~15M records called entities. I want to find top 5 entities matching "hockey" in their name.
I have a Full text index built for that, which is used : gin_ix_entity_full_text_search_name, which indexes the name.
Query
SELECT "entities".*,
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'hockey'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE "entities"."place" = 'f'
AND (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'hockey'::text))
ORDER BY "rank0.48661998202865475" DESC LIMIT 5
Duration 25,623 ms
Explain plan
1 Query plan Limit (cost=12666.89..12666.89 rows=5 width=3116)
2 Query plan -> Sort (cost=12666.89..12670.18 rows=6571 width=3116)
3 Query plan Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''hockey'''::tsquery))
4 Query plan -> Bitmap Heap Scan on entities (cost=124.06..12645.06 rows=6571 width=3116)
5 Query plan Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)
6 Query plan Filter: (NOT place)
7 Query plan -> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..123.74 rows=6625 width=0)
8 Query plan Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''hockey'''::tsquery)
I do not understand why it verifies the index condition twice. (Query plan step 4 and 7). Is it because of my boolean condition (not Place?) If so, I should add it to my index and I should get a very fast query? Or is it the sorting condition which makes it very long?
Thanks helping me understand the Query plan and how to fix my 25 seconds query!
EDIT--------------------- Here's the explain analyze :
QUERY PLAN
Limit (cost=4447.28..4447.29 rows=5 width=3116) (actual time=18509.274..18509.282 rows=5 loops=1)
-> Sort (cost=4447.28..4448.41 rows=2248 width=3116) (actual time=18509.271..18509.273 rows=5 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''test'''::tsquery))
Sort Method: top-N heapsort Memory: 19kB
-> Bitmap Heap Scan on entities (cost=43.31..4439.82 rows=2248 width=3116) (actual time=119.003..18491.408 rows=2533 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''test'''::tsquery)
Filter: (NOT place)
-> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..43.20 rows=2266 width=0) (actual time=74.093..74.093 rows=2593 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''test'''::tsquery)
Total runtime: 18509.381 ms
(10 rows)
Here are my DB parameters . It is an online DB hosted by Heroku, on Amazon services. They describe it as having 1.7GB of ram, 1 processing unit and a DB of max 1TB.
name | current_setting
version | PostgreSQL 9.0.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit
archive_command | test -f /etc/postgresql/9.0/main/wal-e.d/ARCHIVING_OFF || envdir /etc/postgresql/9.0/resource29857_heroku_com/wal-e.d/env wal-e wal-push %p
archive_mode | on
archive_timeout | 1min
checkpoint_completion_target | 0.7
checkpoint_segments | 40
client_min_messages | notice
cpu_index_tuple_cost | 0.001
cpu_operator_cost | 0.0005
cpu_tuple_cost | 0.003
effective_cache_size | 1530000kB
hot_standby | on
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_checkpoints | on
log_destination | syslog
log_line_prefix | %u [YELLOW]
log_min_duration_statement | 50ms
log_min_messages | notice
logging_collector | on
maintenance_work_mem | 64MB
max_connections | 500
max_prepared_transactions | 500
max_stack_depth | 2MB
max_standby_archive_delay | -1
max_standby_streaming_delay | -1
max_wal_senders | 10
port |
random_page_cost | 2
server_encoding | UTF8
shared_buffers | 415MB
ssl | on
syslog_ident | resource29857_heroku_com
TimeZone | UTC
wal_buffers | 8MB
wal_keep_segments | 127
wal_level | hot_standby
work_mem | 100MB
(39 rows)
EDIT 2===========================
Looks like the ORDER BY is the slow part :
d6ifslbf0ugpu=> EXPLAIN ANALYZE SELECT "entities"."name",
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'banana'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'banana'::text))
LIMIT 5;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43.31..53.07 rows=5 width=24) (actual time=76.583..103.623 rows=5 loops=1)
-> Bitmap Heap Scan on entities (cost=43.31..4467.60 rows=2266 width=24) (actual time=76.581..103.613 rows=5 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
-> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..43.20 rows=2266 width=0) (actual time=53.592..53.592 rows=1495 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
Total runtime: 103.680 ms
(6 rows)
VS with ORDER BY STATEMENT
d6ifslbf0ugpu=> EXPLAIN ANALYZE SELECT "entities"."name",
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'banana'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'banana'::text))
ORDER BY "rank0.48661998202865475" DESC LIMIT 5;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4475.12..4475.13 rows=5 width=24) (actual time=15013.735..15013.741 rows=5 loops=1)
-> Sort (cost=4475.12..4476.26 rows=2266 width=24) (actual time=15013.732..15013.735 rows=5 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''banana'''::tsquery))
Sort Method: top-N heapsort Memory: 17kB
-> Bitmap Heap Scan on entities (cost=43.31..4467.60 rows=2266 width=24) (actual time=0.872..15006.763 rows=1495 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
-> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..43.20 rows=2266 width=0) (actual time=0.549..0.549 rows=1495 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
Total runtime: 15013.805 ms
(9 rows)
What I still don't understand, is why this is slower. Looks like it's fetching the same amount of rows from Bitmap Heap Scan , but takes so much longer.