Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a problem in postgresql. I have one cohorte (gathering of people) and i would like counting the persons in this cohorte.

Begin date : "2014-09-01", End date : "2014-11-30".

  • I have 5 persons between 09/01 and 09/22
  • I have 5 persons between 09/20 and 09/25
  • I have 5 persons between 09/26 and 10/05
  • I have 5 persons between 10/01 ans 11/30

I want to have the max of accommodation for each month between the begin date and the end date in SQL (or PHP). Expected max person count:

  • September(09) => 10
  • October(10) => 10
  • November(11) => 5
share|improve this question
    
What is the structure of the tables? –  Marek Jun 25 at 15:52
    
tables : person, contrat and one person = one contrat with one daterange (arrival and departure date) per contrat –  user3775877 Jun 25 at 15:58
    
Add the actual structure of the table in the question. –  Jonast92 Jun 25 at 16:10
    
Question title is not representative. Counting based on several columns? –  mlt Jun 25 at 16:57
1  
@ErwinBrandstetter I think 10 is still correct as at any given day in September it does not exceed 10. –  mlt Jun 25 at 18:39
show 1 more comment

2 Answers

Find the maximum of simultaneously present persons on a single day for every month in a given period.

I suggest generate_series() to produce the series of days in your period. Then aggregate twice:

  • First to get a count for each day. A single day can be dealt with plain BETWEEN. Your ranges are obviously meant to be with include borders.

  • Second to get the maximum per month.

SELECT date_trunc('month', day)::date AS month, max(ct) AS max_ct
FROM  (
   SELECT g.day, count(*) AS ct
   FROM   cohorte
         ,generate_series('2014-09-01'::date  -- first of Sept.
                         ,'2014-11-30'::date  -- last of Nov.
                         ,'1 day'::interval) g(day)
   WHERE  g.day BETWEEN t_begin AND t_end
   GROUP  BY 1
   ) sub
GROUP  BY 1
ORDER  BY 1;

Returns:

month      |  max_ct
-----------+--------
2014-09-01 | 10
2014-10-01 | 10
2014-11-01 | 5

Use to_char() to prettify the month output.

SQL Fiddle .. is down ATM. Here is my test case (that you should have provided):

CREATE TEMP TABLE cohorte (
   cohorte_id serial PRIMARY KEY
  ,person_id  int  NOT NULL
  ,t_begin    date NOT NULL  -- inclusive
  ,t_end      date NOT NULL  -- inclusive
);

INSERT INTO cohorte(person_id, t_begin, t_end)
SELECT g, '2014-09-01'::date, '2014-09-22'::date
FROM   generate_series (1,5) g
UNION ALL
SELECT g+5, '2014-09-20', '2014-09-25'
FROM   generate_series (1,5) g
UNION ALL
SELECT g+10, '2014-09-26', '2014-10-05'
FROM   generate_series (1,5) g
UNION ALL
SELECT g+15, '2014-10-01', '2014-11-30'
FROM   generate_series (1,5) g;

For more complex checks I'd suggest the OVERLAPS operator:
Find overlapping date ranges in PostgreSQL

For more complex scenarios I'd also consider range types:
Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL

share|improve this answer
1  
Thanks a lot, it works !! :) –  user3775877 Jun 26 at 12:21
add comment

can't you use window function? I'd try something like this (I've not tested this code, just exposed my thoughts)

SELECT max(count) FROM (
  SELECT count(*) OVER (PARTITION BY ???) as count
  FROM contract
  WHERE daterange(dateStart, dateEnd, '[]') && daterange('2014-09-01', '2014-10-01', '[)')
) as max

Here, my problem remains that I can't find a way to partition for each day of the interval. Maybe this is a wrong approach, but I would be interested by a solution based on windows.


edit: with this request, you have the max of simultaneous present, but over all the time, not only a given month

with presence as (
  SELECT id, generate_series(begin_date, end_date, '1 day'::interval) AS date
  FROM test
),
presents as (
  SELECT count(*) OVER (PARTITION BY date) AS count
  FROM presence
)
SELECT max(count) from presents;

Here we come, I think

Imagine your person table has 3 columns :

  • id
  • entrance_date
  • leaving_date

the request would look like

WITH presents as (
  SELECT id, 
         daterange(entrance_date, leaving_date, '[]') * daterange('2014-09-01', '2014-11-30', '[]') as range
  FROM person
  WHERE daterange(entrance_date, leaving_date, '[]') && daterange('2014-09-01', '2014-11-30', '[]')
),
present_per_day as (
  SELECT id,
         generate_series(lower(range), upper(range), '1 day'::interval) AS date
  FROM presents
),
count_per_day as (
  SELECT count(*) OVER (PARTITION BY date) AS count,
         date
  FROM present_per_day
),
SELECT max(count) OVER (PARTITION BY date_part('year', date), date_part('month', date)) as max,
       date_part('year', date),
       date_part('month', date)
FROM count_per_day;

(I have to leave, I hope I'll have time to test it later)

In fact, @erwin solution is much much more easy and efficient than this one.

share|improve this answer
    
Thanks, I thought about this solution but i never used the window function so i don't know which partition we have to use... –  user3775877 Jun 25 at 16:38
    
@circlecode OP has overlaps as it is not a single date column but 2. It would work if all dates were in a single column and a flag was used to indicate arrival/departure. A head on approach would, perhaps, involve union. Unpivoting with unnest might help. –  mlt Jun 25 at 16:52
    
And one usually uses EXTRACT (or PG specific equivalent) to fetch month from date. –  mlt Jun 25 at 16:58
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.