Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a python list of values and a postgresql table with a certain column in it. I'd like to know for each element in my python list whether there is any row in the table with that ID.

For example, suppose I have this python list:

vals = [4, 8, 15, 16, 23, 42]

and also that the query:

select my_col from my_table;

gives:

[4, 5, 6, 7, 8]

Then I'd like a query that returns:

[True, True, False, False, False, False]

I could loop through the list and execute a new "select exists" for each value, but I wondered if there was a way to do it in a single call?

I am restricted to postgresql 9.0

share|improve this question
    
Why not just use 2 lines of Python after the select? One to make that result into set s, then just [x in s for x in vals]? –  Alex Martelli Jan 8 at 17:48
    
So pull the entire DB column into python? That would work. It's going to have a few hundred million entries at some point so not sure if that would be a little bad for performance? (The column will be indexed.) –  JoeZuntz Jan 8 at 17:51
    
The example there is just for illustration - I'm not really selecting the full column in my code. –  JoeZuntz Jan 8 at 17:51

2 Answers 2

up vote 1 down vote accepted

This question is more about SQL than aboyt Python or psycopg. I'd use a query like:

SELECT my_col = ANY(your_array_here) FROM my_table;

to get result in "table order" or:

SELECT A.x = ANY(SELECT my_col FROM my_table) 
  FROM (SELECT * FROM unnest(your_array_here) x) A;

to get the result in "vals order".

Fortunately enough psycopg provides a default adapter that converts Python lists to PostgreSQL arrays and the code is extremely simple:

curs.execute("SELECT my_col = ANY(%s) from my_table", (vals,))

or:

curs.execute("""SELECT A.x = ANY(SELECT my_col FROM my_table) 
                  FROM (SELECT * FROM unnest(%s) x) A""", (vals,))

Note that the bound variable argument should be a dict or a tuple and you want to bind the full list to a single variable in the query, meaning that you should use a 1-element tuple ((vals,)) instead of trying to pass vals directly.

share|improve this answer
    
This is perfect - many thanks. And you're right this is more a sql question! –  JoeZuntz Jan 9 at 10:01

I think this requires a mix of string formatting and placeholders (because you need one %s per item in vals):

vals = [4, 8, 15, 16, 23, 42]

query = 'select distinct(my_col) from my_table where my_col in ('
query += ', '.join(['%s'] * len(vals))
query += ')'

cursor.execute(query, vals)
theset = {t[0] for t in cursor.fetchall()}

theboollist = [v in theset for v in vals]

This approach should guarantee that the amount of data you're sending to the DB (for the where ... in clause) and the amount you're getting back from it are both O(N) where N equals len(vals); I think it would be logically impossible to do better than that in big-O terms.

share|improve this answer
1  
Would it help to use select distinct my_col to guarantee len(cursor.fetchall()) <= N? –  HappyLeapSecond Jan 8 at 18:11
    
@unutbu sure, if duplications are possible in my_col -- hadn't considered that case, now editing to fix, tx. –  Alex Martelli Jan 8 at 18:36

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.