Your query appears to:
- Find the three names that appear most in the overall data set
- Extract the year from the data, producing (year, name) pairs
- and for each year count the number of times each name appears
- Output the count for each name in a separate column with the name as its alias
All of this, except the final step, can be done in plain SQL.
SELECT
extract(year FROM year1) AS "year",
ame,
count(ame) AS ame_count
FROM yourtable
WHERE ame IN (
SELECT ame
FROM yourtable
GROUP BY ame
ORDER BY count(ame) DESC
LIMIT 3
)
GROUP BY 1, 2;
per http://sqlfiddle.com/#!15/be97f/7
The last bit is what's called a cross-tabulation or pivot. Unfortunately, PostgreSQL sucks at these, they're really annoying to write. You have to write them yourself with dynamic SQL or use the crosstab
function from the tablefunc
extension. I'll do the latter.
SELECT
*
FROM
crosstab('
SELECT
extract(year FROM year1)::integer AS "year",
ame,
count(ame)::integer AS ame_count
FROM yourtable
WHERE ame IN (
SELECT ame
FROM yourtable
GROUP BY ame
ORDER BY count(ame) DESC
LIMIT 3
)
GROUP BY 1, 2
ORDER BY 1, 2
','
SELECT ame
FROM yourtable
GROUP BY ame
ORDER BY count(ame) DESC
LIMIT 3
')
top_names("year" integer, "first" integer, "second" integer, "third" integer);
but even then, the column aliases aren't what you want:
year | first | second | third
------+-------+--------+-------
2001 | 1 | 1 |
2002 | 1 | 1 |
2003 | 2 | | 3
2004 | 1 | 1 |
(4 rows)
Unfortunately, you'll have the same problem in custom PL/PgSQL. There's no way for a function to specify the column aliases dynamically at runtime. But we can be a bit tricky, and have the function generate a query as a cursor and return the cursor, then fetch the cursor. That leads to something like this monstrosity:
BEGIN;
DO
LANGUAGE plpgsql
$$
DECLARE
top_names text[];
name_sql text := '';
"name" text;
curs refcursor := 'top_names';
BEGIN
SELECT INTO STRICT top_names
array_agg(ame)
FROM (
SELECT
ame
FROM yourtable
GROUP BY ame
ORDER BY count(ame) DESC
LIMIT 3
) topnames(ame);
FOREACH "name" IN ARRAY top_names
LOOP
name_sql := format(
'%s, sum(case when ame = %L then 1 end) as %I',
name_sql, "name", "name"
);
END LOOP;
OPEN curs FOR EXECUTE format(
'
SELECT
extract(year FROM year1) AS "year"
%s
FROM yourtable
WHERE ame = ANY(%L)
GROUP BY 1
ORDER BY 1
',
name_sql,
top_names
);
RAISE NOTICE '%', curs;
END;
$$;
FETCH ALL FROM "top_names";
COMMIT;
producing:
regress=> FETCH ALL FROM "top_names";
year | tim | pol | kim
------+-----+-----+-----
2001 | 1 | 1 |
2002 | 1 | 1 |
2003 | 2 | | 3
2004 | 1 | 1 |
(4 rows)
If you want to include years with zero matches you'll need to do a left outer join on the list of distinct years, too.
PREPARE in PostgreSQL
orDEALLOCATE in PostgreSQL
etc. – Radu Gheorghiu Jun 6 at 9:16