I have a table temp it has 118507658 records . I need to clean up this table based on below condition.

If three columns that is (dp_content_definition_id,dp_order ,value_scope_id) are identical then delete these record but keep max and second max entry_date record.

I wrote this query to find the records .

select value_id ,content_definition_id ,order ,value_scope_id ,entry_date 
from temp ou
where (select count(*) from dp_values inr
where 
inr.content_definition_id = ou.content_definition_id 
and inr.order = ou.order 
and inr.value_scope_id = ou.value_scope_id ) > 3
order by content_definition_id,order ,value_scope_id
 limit 10000;

But my first problem is that it takes so much time just to find 10000 records and no idea how much time it will take to find to take all records. after that i dont know query to how to delete .

share|improve this question
up vote 0 down vote accepted

Simply use a Windowed Aggregate Function, create a ROW_NUMBER based on descending dates and look for those with a value >=3.

select 
    select value_id ,content_definition_id ,order ,
    value_scope_id ,entry_date,
from
 (
    select value_id ,content_definition_id ,order ,
       value_scope_id ,entry_date,
       row_number(*)
       over (partition by content_definition_id ,order ,value_scope_id 
             order by entry_date desc ) as rn
    from temp ou
 ) dt
where rn >= 3
order by content_definition_id,order ,value_scope_id;

This will return the data to be deleted. If you change the condition to where rn < 3 you get the data you want to keep. Depending on the percentage it might be more efficient to Insert/Select the remaining rows into a new table instead of deleting the old ones.

share|improve this answer
    
Oh wao solution looks great but i did not understand the query .Even i got the result also .Thanks for the quick reply . – Sudarshan Thakur Jan 8 '16 at 8:40
    
If possible can you explain this please .. – Sudarshan Thakur Jan 8 '16 at 8:41
    
@SudarshanThakur: As the name implies ROW_NUMBER adds a sequential number to each row within the same PARTITION, i.e. the number is based on decending entry_date and reset to 1 whenever content_definition_id ,order ,value_scope_id changes. Those Windowed Aggregate Functions are supported by most DBMSes (besides MySQL) and really simplify lots of complicated queries (e.g. accessing previous row's data/aggregation plus detail rows/ranking). simple-talk.com/sql/learn-sql-server/… – dnoeth Jan 8 '16 at 8:57
    
Thanks for the explanation.I am running in the proper environment but it has taken more than 45 minutes to get records from 11 crore records and query is still running . – Sudarshan Thakur Jan 8 '16 at 9:21
    
@SudarshanThakur: You probably need proper indexing, in best case matching the PARTITION/ORDER BY. – dnoeth Jan 8 '16 at 9:24

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.