Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm using PostgreSQL (9.2.0) and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips
(
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

I've added indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num
  ON ips
  USING btree
  (begin_ip_num );

CREATE INDEX index_ips_on_end_ip_num
  ON ips
  USING btree
  (end_ip_num );

The Query being used is:

SELECT "ips".* FROM "ips" WHERE (3065106743 BETWEEN begin_ip_num AND end_ip_num);

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

share|improve this question
2  
I don't see your query and your version of Postgres? This obviously isn't a "a table of ip addresses", but a table of IP ranges. Please explain what you are trying to achieve. –  Erwin Brandstetter Jan 18 '13 at 21:22
    
Sorry about that. I've added the query being used. –  Zain Zafar Jan 18 '13 at 21:34
    
The version is 9.2.0. –  Zain Zafar Jan 18 '13 at 21:43
    
If IP ranges have a significant impact in the db's target business than you should consider using the ip4r extension. –  Clodoaldo Neto Jan 19 '13 at 10:07

3 Answers 3

up vote 6 down vote accepted

Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very effective and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2.

Barring that, you can check out this closely related answer on dba.SE with a rather sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance.

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is not effective.

Also, is autovaccuum running or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

share|improve this answer
    
Thanks a lot! I've settled with the composite index. Adding DESC to end_ip_num did make a difference. Just as a side not, the composite index you've mentioned should only work >= 9.2.0, right? –  Zain Zafar Jan 18 '13 at 22:32
1  
@ZainZafar: The multicolumn (= composite) index I mention works in all versions this side of the millennium (at least since 7.3, probably longer). The linked solution on dba.SE has been tested with 9.1. The range type in combination with a GiST index requires 9.2. –  Erwin Brandstetter Jan 18 '13 at 22:36

I had exactly this same problem on a nearly identical dataset from maxmind.com's free geiop table. I solved it using Erwin's tip about range types and GiST indexes. The GiST index was key. Without it I was querying at best about 3 rows per second. With it I queried nearly 500000 rows in under 10 seconds! Since Erwin didn't post detailed instructions on how to do this, I thought I'd add them, here...

First of all, you must add a new column having the range type, note that int8range is required for bigint types. Next set its values appropriately, note that the '[]' parameter indicates to make the range inclusive at lower and upper bounds (rtfm). Finally add the index, note that the GiST index is where all the performance advantage comes from.

alter table ips add column iprange int8range;
update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]');
create index index_ips_on_iprange on ips using gist (iprange);

Having laid the groundwork, you can now use the '<@' contained-by operator to search specific addresses against the table. See http://www.postgresql.org/docs/9.2/static/functions-range.html

SELECT "ips".* FROM "ips" WHERE (3065106743 <@ iprange);
share|improve this answer

I believe your query looks like WHERE [constant] BETWEEN begin_ip_num AND end_ipnum or

As far as I know Postgres doesn't have "AND-EQUAL " access plan, so you need to add a composite index on 2 columns as suggested by Erwin Brandstetter.

share|improve this answer
    
I've tried using a composite index and it results in a sequential scan. –  Zain Zafar Jan 18 '13 at 21:46

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.