Join the Stack Overflow Community
Stack Overflow is a community of 6.4 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

What is the best way to find records with duplicate values across multiple columns using Postgres, and Activerecord?

I found this solution here:

User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )

But it doesn't seem to work with postgres. I'm getting this error:

PG::GroupingError: ERROR: column "parts.id" must appear in the GROUP BY clause or be used in an aggregate function

share|improve this question
2  
In regular SQL, I'd use a self-join, something like select a.id, b.id, name, email FROM user a INNER JOIN user b USING (name, email) WHERE a.id > b.id. No idea how to express that in ActiveRecord-speak. – Craig Ringer Feb 10 '14 at 4:48
up vote 78 down vote accepted

Tested & Working Version

User.select(:first,:email).group(:first,:email).having("count(*) > 1")

Also, this is a little unrelated but handy. If you want to see how times each combination was found, put .count at the end:

User.select(:first,:email).group(:first,:email).having("count(*) > 1").count

and you'll get a result set back that looks like this:

{[nil, nil]=>512,
 ["Joe", "[email protected]"]=>23,
 ["Jim", "[email protected]"]=>36,
 ["John", "[email protected]"]=>21}

Thought that was pretty cool and hadn't seen it before.

Credit to Taryn, this is just a tweaked version of her answer.

share|improve this answer
3  
count not work, length does – William Herry Jun 18 '14 at 2:05
6  
I had to pass an explict array to select() as in: User.select([:first,:email]).group(:first,:email).having("co‌​unt(*) > 1").count in order to work. – Rafael Oliveira Oct 6 '14 at 22:27
1  
adding the .count gives PG::UndefinedFunction: ERROR: function count – Magne Jul 8 '15 at 13:01
    
You can try User.select([:first,:email]).group(:first,:email).having("co‌​unt(*) > 1").map.count – Serg0 Aug 4 '15 at 12:40
2  
I'm trying the same method but trying to get the User.id as well, adding it to the select and group returns an empty array. How can I return the whole User model, or at least include the :id? – Ashbury Oct 27 '15 at 8:37

That error occurs because POSTGRES requires you to put grouping columns in the SELECT clause.

try:

User.select(:first,:email).group(:first,:email).having("count(*) > 1").all

(note: not tested, you may need to tweak it)

EDITED to remove id column

share|improve this answer
2  
That's not going to work; the id column is not part of the group, so you cannot refer it unless you aggregate it (e.g. array_agg(id) or json_agg(id)) – Craig Ringer Feb 10 '14 at 4:46

Based on the answer above by @newUserNameHere I believe the right way to show the count for each is

res = User.select('first, email, count(1)').group(:first,:email).having('count(1) > 1')

res.each {|r| puts r.attributes } ; nil
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.