Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a Messages table that looks like this:

                    Messages
+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      1     |      2      |   1/1/2013   |
|  2  |      1     |      2      |   1/1/2013   |
|  3  |      2     |      1      |   1/2/2013   |
|  4  |      3     |      2      |   1/2/2013   |
|  5  |      3     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/4/2013   |
+-----------------------------------------------+

Where a 'thread' is a group of messages between a given sender_id and receiver_id I want a query to return the most recent 10 messages for the most recent 10 threads where either the sender_id or receiver_id is a given id.

Expected output where given user_id is 5:

+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      5     |      2      |   1/4/2013   |
|  2  |      5     |      2      |   1/4/2013   |
|  3  |      2     |      5      |   1/4/2013   |
|  4  |      3     |      5      |   1/4/2013   |
|  5  |      5     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/3/2013   |
+-----------------------------------------------+

up to a limit of 10 messages between, for example, user 5 and 2 (above there are 4) and a limit of 10 threads (above there are 3).

I've been trying with this sort of query using a subquery but haven't managed to get the second limit on the number of distinct threads.

SELECT * FROM (SELECT DISTINCT ON (sender_id, receiver_id) messages.* 
FROM messages 
WHERE (receiver_id = 5 OR sender_id = 5) ORDER BY sender_id, receiver_id, 
created_at DESC)   
q ORDER BY created_at DESC 
LIMIT 10 OFFSET 0;

I'm considering creating a new Thread table containing a thread_id field which would be the concatenation of sender_id + receiver_id and then just joining on Messages but I have a sneaky suspicion that it should be doable with just one table.

share|improve this question
Can you show the query you have tried and expected output based on the sample data you have showed above? – bonCodigo Feb 5 at 18:19
Just to be clear are you asking for 1 message per thread (returning at most 10 rows) or 10 messages per thread (returning at most 100 rows). – couling Feb 5 at 18:41
@couling I am asking for 10 messages per thread (returning at most 100 rows) – johnnymire Feb 6 at 7:47
@johnnymire okay I've modified my answer to fit this. However as stated in the answer I think you're better to break this into multiple queries. – couling Feb 6 at 11:19
1  
@johnnymire: Most recent thread assumes that one message of the thread gives the thread's date. You should specify if it's the oldest message (start of thread), or the more recent (latest thread update). – Daniel Vérité Feb 6 at 11:46
show 1 more comment

5 Answers

I would suggest taking couling's answer and slightly modifying it so that it does provide effectively two queries using a common table expression:

WITH threads (sender_id, receiver_id, latest) as (
        select sender, 
               receiver,
               max(sent) 
          from sof_messages
         where receiver = <user>
            or sender = <user>
         group by sender,
               receiver
         order by 3
         limit 10
 ), 
 messages ([messages fields listed here], rank) as (
         select m.*, 
                rank() over (partition by (sender, receiver), order by sent desc)
           from sof_messages
          WHERE (sender, receiver) in (select (sender, receiver) from threads))
 SELECT * from messages where rank <= 10;

This has the advantage of allowing the planner to have a pretty good idea of when to use indexes here. In essence each of the three pieces to the query is independently planned.

share|improve this answer

I'm posting this to show what can be done.

I do not really recommend using it.

It would be much better to do two separate queries: 1 to retrieve the 10 most recent threads and 1 repeated to pull back the 10 most recent messages for each thread.

However you may achieve your goal with the rank() window function as shown below.

select * from (
      select message.*,
             rank() over (partition by message.sender, message.receiver 
                              order by sent desc )  
      from sof_messages message,
           (
            select sender, 
                   receiver,
                   max(sent) 
              from sof_messages
             where receiver = <user>
                or sender = <user>
             group by sender,
                   receiver
             order by 3
             limit 10
           ) thread
      where message.sender = thread.sender
        and message.receiver = thread.receiver
      ) message_list

where rank <= 10

There are a couple of different queries which will achieve your goal with window functions, none of them particularly clean.

share|improve this answer

Creating a Thread table looks wrong because of data duplication, but a view may help:

CREATE VIEW threads AS 
  SELECT sender_id, receiver_id, min(created_at) AS t_date
  FROM messages
  GROUP BY sender_id,receiver_id;

Change min(created_at) to max(created_at) if the thread's date is to be the date of its most recent message, rather than the oldest one.

Then it can be joined back to messages simply with:

SELECT ... FROM messages JOIN threads USING (sender_id,receiver_id)
share|improve this answer

The tidiest query I could imagine to resolve your problem within one query is the following one:

select * from (
  select row_number() 
    over (partition by sender_id, receiver_id order by created_at desc) as rn, m.*
  from Messages m
  where (m.sender_id, m.receiver_id) in (
    select sender_id, receiver_id
    from Messages
    where sender_id = <id> or receiver_id = <id>
    group by sender_id, receiver_id
    order by max(created_at) desc
    limit 10 offset 0
  )
) res where res.rn <= 10

The row_number() over (partition by sender_id, receiver_id order by created_at desc) column will contain the row number of each message within each thread (it will be like the record number if you run a separate query to query just for one thread). Apart from this row number you query the message itself if it is contained in the 10 topmost threads (that is made by that (m.sender_id, m.receiver_id) in ...query.... And finally as you want just 10 topmost messages you limit the row number to be lower or equal to 10.

share|improve this answer

I haven't tested this, but it looks like you forgot the LIMIT 10 on your subquery which gives you the 10 most-recent threads:

SELECT
  *
FROM
  (SELECT DISTINCT ON
     (sender_id, receiver_id) messages.* 
   FROM
     messages 
   WHERE
     (receiver_id = 5 OR sender_id = 5)
   ORDER BY
     sender_id, receiver_id, created_at DESC
   LIMIT
     10)   
  q
ORDER BY
  created_at DESC 
LIMIT
  10
OFFSET
  0;

(I've pretty-printed the SQL so it is easier to tell what is going on.)

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.