I've got this table (PostgreSQL 9.3):

x | y  
- | -  
1 | 1  
2 | 2  
2 | 3  
1 | 4

Now I try to get three partitions out of it: Every time the value x is changing (by ordering y), a new dense_rank value should be given. Now I tried the following:

SELECT x, y, dense_rank() over (partition by x order by y) 
FROM table

But with the partition over x the ordering is not working the way I expect. The result is

x  y  dense_rank  
-  -  ----------  
1  1  1     
1  4  2     
2  2  1     
2  3  2 

instead of the expected:

x  y  dense_rank  
-  -  ----------  
1  1  1     
2  2  2     
2  3  2     
1  4  3 

Now I am not sure why the window is not ordered by y.

In the second step I need this rank for a grouping (GROUP BY dense_rank, x). So in the end I need the following result:

x  y      dense_rank  
-  -      ----------  
1  1      1     
2  {2,3}  2         
1  4      3 

Maybe this could be achieved in an easier way?

share|improve this question
    
When you partition by x and order by y within window function Postgres has to order it by x,y so this is why your output is ordered like it is (x=1, y=4 comes before x=2, y=2) – Kamil G. Feb 9 at 16:50
    
@ConsiderMe thanks. That's the point i missed. – S-Man Feb 9 at 18:50
up vote 2 down vote accepted

partition over x the ordering is not working the way I expect

It is working perfectly fine. When you partition by x first 1 and last 1 are in the same group.

Window Functions:

The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

To get result you want you could use (classic example of gaps and islands problem):

SELECT *, ROW_NUMBER() OVER (ORDER BY y) -
          ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) + 1 AS group_id
FROM tab
ORDER BY group_id

LiveDemo

Output:

╔═══╦═══╦══════════╗
║ x ║ y ║ group_id ║
╠═══╬═══╬══════════╣
║ 1 ║ 1 ║        1 ║
║ 2 ║ 2 ║        2 ║
║ 2 ║ 3 ║        2 ║
║ 1 ║ 4 ║        3 ║
╚═══╩═══╩══════════╝

Warning:
This solution is not general.

EDIT:

More general solution is to utilize LAG to get previous value and windowed SUM:

WITH cte AS
(
  SELECT t1.x, t1.y, LAG(x) OVER(ORDER BY y) AS x_prev
  FROM tab t1
)
SELECT x,y, SUM( CASE WHEN x = COALESCE(x_prev,x) THEN 0 ELSE 1 END) 
            OVER(ORDER BY y) + 1 AS group_id
FROM cte
ORDER BY group_id;

LiveDemo2

share|improve this answer
    
Wow. Ok, it is not the way I expected. Is there no way for rank() instead of two row_number() calls. I understood the rank-function that it counts the different partitions. But I found my error: I thought there is first an ordering an then the partitioning. But it is only the ordering within the partition of course. That makes sense now. So thank you for understanding. Is there a way to make partitions after the ordering? So if I order by y first and then I can do the partitions by x... – S-Man Feb 9 at 16:53
    
Hi, i thought about your solution for a while. I was wondering why it works. It does for my special case because I will group the result on x and group_id. But it is not a general solution for the partitioning problem in my option. If I expand my example, the group_id is not changing if x is changing: link. The 5th data set contains: x = 2. So it differs from the 4th. In my case it has to be a new group_id. But it has the same as the 4th line (order by y). Is there a general solution? – S-Man Feb 9 at 18:47

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.