Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

It is bothering me that for a simple query, I have to write out so many sub-selects and WITH statements.

The question is: are there basic guidelines on how to simplify queries that have subqueries?

Here's my query:

WITH cte_min
     AS (SELECT a.client_id,
                a.specimen_source,
                a.received_date
         FROM   f_accession_daily a
                JOIN (SELECT DISTINCT f.client_id,
                                      f.received_date,
                                      f.accession_daily_key
                      FROM   F_ACCESSION_DAILY f
                             JOIN (SELECT CLIENT_ID,
                                          Min(received_date) MinRecDate
                                   FROM   F_ACCESSION_DAILY
                                   GROUP  BY CLIENT_ID) i
                               ON f.CLIENT_ID = i.CLIENT_ID
                                  AND f.RECEIVED_DATE = i.MinRecDate) b
                  ON a.ACCESSION_DAILY_KEY = b.ACCESSION_DAILY_KEY),
     cte_max
     AS (SELECT a.client_id,
                a.specimen_source,
                a.received_date
         FROM   f_accession_daily a
                JOIN (SELECT DISTINCT f.client_id,
                                      f.received_date,
                                      f.accession_daily_key
                      FROM   F_ACCESSION_DAILY f
                             JOIN (SELECT CLIENT_ID,
                                          Max(received_date) MaxRecDate
                                   FROM   F_ACCESSION_DAILY
                                   GROUP  BY CLIENT_ID) i
                               ON f.CLIENT_ID = i.CLIENT_ID
                                  AND f.RECEIVED_DATE = i.MaxRecDate) b
                  ON a.ACCESSION_DAILY_KEY = b.ACCESSION_DAILY_KEY),
     cte_est
     AS (SELECT DISTINCT client_id,
                         MLIS_DATE_ESTABLISHED
         FROM   D_CLIENT
         WHERE  REC_ACTIVE_FLG = 1
                AND MLIS_DATE_ESTABLISHED IS NOT NULL)
SELECT DISTINCT f.client_id,
                cmin.specimen_source,
                cmin.received_date,
                cmax.specimen_source,
                cmax.received_date,
                cest.MLIS_DATE_ESTABLISHED
FROM   F_ACCESSION_DAILY f
       LEFT JOIN cte_max cmax
         ON cmax.CLIENT_ID = f.CLIENT_ID
       LEFT JOIN cte_min cmin
         ON cmin.CLIENT_ID = f.CLIENT_ID
       LEFT JOIN cte_est cest
         ON cest.CLIENT_ID = f.CLIENT_ID 

I am not asking necessarily for you to do the simplification yourself (although I would be very grateful for this), rather I am asking for general guidelines/directions on re-writing this query to be more elegant.

share|improve this question
6  
I wouldn't necesssarily rewrite it just for aesthetics if it already works. Are you having any kind of performance problems with it? – DJ Quimby Sep 25 '12 at 18:23
2  
I definitely wouldn't rewrite it because you don't like subqueries. If it works and is fast enough, why on earth touch it? CTEs are used to simplify readability, so removing them would likely make the query a lot harder to follow. – JNK Sep 25 '12 at 19:15
Could you share an execution plan please? – Paul White Sep 25 '12 at 20:29

3 Answers

up vote 3 down vote accepted

Does this look any better?

;WITH minmax AS (
SELECT client_id, specimen_source, received_date,
       RMin = row_number() over (partition by Client_id
                                 order by received_date, accession_daily_key),
       RMax = row_number() over (partition by Client_id
                                 order by received_date desc, accession_daily_key desc)
FROM F_ACCESSION_DAILY
)
SELECT f.client_id,
       max(case when rmin=1 then f.specimen_source end),
       max(case when rmin=1 then f.received_date end),
       max(case when rmax=1 then f.specimen_source end),
       max(case when rmax=1 then f.received_date end),
       D.MLIS_DATE_ESTABLISHED
FROM   minmax f
LEFT JOIN D_CLIENT D ON D.REC_ACTIVE_FLG = 1 AND D.MLIS_DATE_ESTABLISHED IS NOT NULL
WHERE 1 in (f.rmin, f.rmax)
GROUP BY f.client_id, D.MLIS_DATE_ESTABLISHED
share|improve this answer
holy holy crap now THIS is what i call elegant. WOW! – Артём Царионов Sep 25 '12 at 21:49
Where this might be different than OP is more than one ACCESSION_DAILY_KEY on the min/max (received_date). Still +1 – Blam Sep 25 '12 at 23:56
@Blam - I had considered that, but given they have the same received_date which was part of the question's DISTINCT, that's a moot point. – RichardTheKiwi Sep 26 '12 at 0:01
I was going to argue as it is not filtered in the first distinct but realized it is filtered in the final distinct. Man that is a messy query. Already gave you a +1. – Blam Sep 26 '12 at 0:20

50 rows reporting 5 values and in all of that only two tables are referenced.

In the first CTE you have 4 joins (or virtual joins) to the same table and no other table involved reporting 3 columns. Don't know the key so cannot conclude it can be reduced.

If a cte is not reference more than once then it does not result in less lines of code.

For one this cte can be replaced with less code.

cte_est
     AS (SELECT DISTINCT client_id,
                         MLIS_DATE_ESTABLISHED
         FROM   D_CLIENT
         WHERE  REC_ACTIVE_FLG = 1
                AND MLIS_DATE_ESTABLISHED IS NOT NULL)
...
cest.MLIS_DATE_ESTABLISHED
...
LEFT JOIN cte_est cest
         ON cest.CLIENT_ID = f.CLIENT_ID

reduces to

D_CLIENT.MLIS_DATE_ESTABLISHED
...
LEFT JOIN  D_CLIENT 
       ON  D_CLIENT.CLIENT_ID = f.CLIENT_ID 
       AND D_CLIENT.REC_ACTIVE_FLG = 1
       AND D_CLIENT.MLIS_DATE_ESTABLISHED IS NOT NULL
share|improve this answer

While I am not sure if everyone would consider this simpler and/or easier to read, this is how I would do it:

WITH 
  cte_MaxMinRecvd As
(
    SELECT      CLIENT_ID,   
                Min(received_date) MinRecDate,
                Max(received_date) MaxRecDate
    FROM        F_ACCESSION_DAILY
    GROUP BY    CLIENT_ID
)
, cte_MaxMinDaily As
(
    SELECT      *
    FROM        F_ACCESSION_DAILY f
    JOIN        cte_MaxMinRecvd   i     ON  f.CLIENT_ID = i.CLIENT_ID
)
, cte_min AS 
(
    SELECT  a.client_id,
            a.specimen_source,
            a.received_date
    FROM    F_ACCESSION_DAILY a
    WHERE   EXISTS(
                    SELECT  *
                    FROM    cte_MaxMinDaily f
                    WHERE   f.RECEIVED_DATE       = f.MinRecDate
                      AND   a.ACCESSION_DAILY_KEY = f.ACCESSION_DAILY_KEY
                   )
)
, cte_max AS 
(
    SELECT  a.client_id,
            a.specimen_source,
            a.received_date
    FROM    f_accession_daily a
    WHERE   EXISTS(
                    SELECT  *
                    FROM    cte_MaxMinDaily f
                    WHERE   f.RECEIVED_DATE       = f.MinRecDate
                      AND   a.ACCESSION_DAILY_KEY = f.ACCESSION_DAILY_KEY
                   )
)
SELECT DISTINCT 
            f.client_id,
            cmin.specimen_source,
            cmin.received_date,
            cmax.specimen_source,
            cmax.received_date,
            cest.MLIS_DATE_ESTABLISHED
FROM        F_ACCESSION_DAILY f
LEFT JOIN   cte_max  cmax        ON  cmax.CLIENT_ID = f.CLIENT_ID
LEFT JOIN   cte_min  cmin        ON  cmin.CLIENT_ID = f.CLIENT_ID
LEFT JOIN   D_CLIENT cest        ON  cest.CLIENT_ID = f.CLIENT_ID 
                                 AND cest.REC_ACTIVE_FLG = 1
                                 AND cest.MLIS_DATE_ESTABLISHED IS NOT NULL

Mainly what I did was to

  1. Turn most of the subqueries into CTEs, where applicable,
  2. Merge the Min and Max subqueries together, and
  3. Change the DISTINCT subqueries into EXISTS subqueries, which can be simpler (and usually perform better)

Ooops, I also got rid of the cte_est CTE as Blam suggested..

share|improve this answer
Sort of works as a general pattern, but you haven't really unravelled the group by/min/max pattern which really was a row_number() solution in legacy form. – RichardTheKiwi Sep 25 '12 at 22:33
Yes, I considered that approach, but it appeared to me that I would have to make assumptions about which columns were keys/unique ... – RBarryYoung Sep 26 '12 at 0:20

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.