up vote 3 down vote favorite
share [g+] share [fb]

I'm looking for a easy function to get the number of datasets (of any table) containing NULL values. It's not important how many columns are involved.

For example:

CREATE TABLE tab1 (
id INTEGER PRIMARY KEY,
dat1 VARCHAR,
dat2 VARCHAR);

INSERT INTO tab1 VALUES 
(1, NULL, 'abc'), 
(2, 'abc', NULL), 
(3, 'abc', 'abc');

SELECT count(*)
FROM tab1
WHERE id IS NULL OR dat1 IS NULL OR dat2 IS NULL;
-- 2

(I know that Primary Keys are not likely to habe NULLs ;)

My current solution uses R to build the query. Any suggestions?

link|improve this question
What version of postgres are you on? – Jack Douglas Jun 6 '11 at 13:15
feedback

2 Answers

up vote 2 down vote accepted

You will need to use dynamic SQL to acheive this, perhaps like this:

testbed:

create role stack;
create schema authorization stack;
set role stack;

create table my_table as 
select generate_series(0,9) as id, null::integer as val;

create table my_table2 as 
select g as id, case when g<=5 then null::integer else 1 end as val
from generate_series(0,9) g;

function:

create function get_with_nulls_rowcount(p_schema in text, p_table in text) 
                returns integer language plpgsql as $$
declare 
  n integer;
  s text;
begin
  --
  select array_to_string(array_agg(column_name::text||' is null'), ' or ')
  into s
  from information_schema.columns
  where table_schema=p_schema and table_name=p_table;
  --
  execute 'select count(*) from '||p_table||' where '||s into n;
  return n;
end;$$;

query:

select table_schema, table_name, 
       get_with_nulls_rowcount(table_schema, table_name)
from information_schema.tables
where table_schema='stack';

result:

 table_schema | table_name | get_with_nulls_rowcount
--------------+------------+-------------------------
 stack        | my_table   |                      10
 stack        | my_table2  |                       6
(2 rows)
link|improve this answer
feedback

Add anything to NULL and the result is NULL. So I'd go by using:

SELECT count(*)
FROM tab1
WHERE --id IS NULL OR  /*PK is NULL? :-)*/
    (dat1 + dat2) IS NULL;

But that it's just easier to write, I don't think it's faster than you solution.

link|improve this answer
Thanks for the quick answer. The query for a paticular table seems to be quite clear - I'm actually looking for a function which can calculate this for any given table! – FloE Jun 6 '11 at 11:21
1  
I don't think there is a function for showing if a row has or not null values, but I'll better let the wiser guys tell you. If not, I would build a dynamic sql statement that goes to the system tables and shows the table's columns and build a statement similar to the one from the answer. – Marian Jun 6 '11 at 11:47
feedback

Your Answer

 
or
required, but never shown

Not the answer you're looking for? Browse other questions tagged or ask your own question.