postgres GROUP BY and ORDER BY problem

Question

I have two tables like this:

CREATE TABLE cmap5 (
   name     VARCHAR(2000),
   lexemes      TSQUERY 
);

and

 CREATE TABLE IF NOT EXISTS synonyms_all_gin_tsvcolumn (
    cid         int NOT NULL references pubchem_compounds_index(cid) ON UPDATE CASCADE ON DELETE CASCADE,
    name            VARCHAR(2000) NOT NULL,  
    synonym         VARCHAR(2000) NOT NULL,
    tsv_syns                TSVECTOR,
    PRIMARY KEY (cid, name, synonym)
);

My current query is:

SELECT s.cid, s.synonym, c.name, ts_rank(s.tsv_syns,c.lexemes,16) 
FROM synonyms_all_gin_tsvcolumn s, cmap5 c WHERE c.lexemes @@ s.tsv_syns

and the output is:

cid     |  synonym                              | name (query)              | rank
5474706 | 10-Methoxyharmalan                    | 10-methoxyharmalan        | 0.0901673
1416    | (+/-)12,13-EODE                       | 12,13-EODE                |  0.211562
5356421 | LEUKOTOXIN B (12,13-EODE)             | 12,13-EODE                |  0.211562
 180933 | 1,4-Chrysenequinone                   | 1,4-chrysenequinone       |  0.211562
5283035 | 15-Deoxy-delta-12,14-prostaglandin J2 | 15-delta prostaglandin J2 |  0.304975
5311211 | 15-deoxy-delta 12 14-prostaglandin J2 | 15-delta prostaglandin J2 |  0.304975
5311211 | 15-deoxy-Delta(12,14)-prostaglandin J2| 15-delta prostaglandin J2 |  0.304975
5311211 | 15-Deoxy-delta-12,14-prostaglandin J2 | 15-delta prostaglandin J2 |  0.304975
5311211 | 15-Deoxy-delta 12, 14-Prostaglandin J2| 15-delta prostaglandin J2 |  0.304975

I would like to return the name matches of all rows in cmap5 in my main table ranked by the ts_rank function but for each row in cmap5 I want to:

 -- select only the best X cids to each query (group by cid)
 -- or ORDER BY my results as 1+ts_rank/count(cid)

To get the best match I add a select distinct on c.name, but when the rank is the same I wanna get the cid with more matches to the query. i have tried adding a simple group by at the end of the query but I get an error, how could I do this?

I have read your question a couple of times. It's just not clear what the result should look like. Which columns shall be included exactly. Please add an example to clarify. Also: Why include the irrelevant fk constraint to pubchem_compounds_index in CREATE TABLE? Just makes the test case fail.
There's another table to which this one is linked and that it's irrelevant to this query. The output I want is:
There's another table to which this one is linked and that it's irrelevant to this query. What I want is, on one hand for those results whose rank is the same, eg above 5283035 and 5311211, get 5311211 as top result because that cid has more hits than 5283035, so I sort of wanna take into account the number of hits/cid in the rank, like final_rank = 1+ts_rank(cid)/no. of hits(cid).
On the other hand I want to get the first XD cids per query name. If I use LIMIT X it returns the first X results of the entire query table, not the first X per name (row) of the query table as I want
Please edit your question and add the example there ("edit" link under the question at the left). All substantial information should go into the question, comments are too hard to read for that.

Erwin Brandstetter · Answer 1 · 2013-03-01 03:46:44Z

I guess what you want is this, because it would make sense:

SELECT DISTINCT ON (c.name)
       c.name, min(s.synonym) AS min_synonym, s.cid
      ,ts_rank(s.tsv_syns,c.lexemes,16) AS rnk
      ,count(*) AS ct
FROM   synonyms_all_gin_tsvcolumn s
JOIN   cmap5                      c ON c.lexemes @@ s.tsv_syns
GROUP  BY c.name, rnk, s.cid
ORDER  BY c.name, rnk DESC, ct DESC

I use explicit ANSI JOIN syntax (doing the same as your CROSS JOIN plus WHERE clause). It's generally considered superior (easier to read and debug). I also use rnk as column name since I avoid function names as identifiers.
Group the results per c.name that have the same rnk and s.cid, take min(s.synonym) (for lack of definition in the question, count(*) the peers per group,
Narrow the result down to one per c.name with DISTINCT ON (Postgres specific extension of SQL standard DISTINCT), taking the highest rank first and among same rank, take highest peer count first.

Combining GROUP BY and DISTINCT ON this way in same query level is possible since DISTINCT is applied after GROUP BY.

asked	2 months ago
viewed	98 times
active	2 months ago

postgres GROUP BY and ORDER BY problem

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged postgresql full-text-search group-by or ask your own question.

postgres GROUP BY and ORDER BY problem

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged postgresql full-text-search group-by or ask your own question.

Related