Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a datamining problem with a very large number of attributes (>15,000). I represent the data in multiple tables with a common Id field. Having deleted empty attributes, I want to merge the resulting smaller tables so I can calculate correlations to remove redundent attributes.
I can join the tables easily with:

select * from lrg_bin0 k0
  join lrg_bin1 k1 on k0.id = k1.id
  join lrg_bin2 k2 on k0.id = k2.id
  join lrg_bin3 k3 on k0.id = k3.id

but when I try to create a new table with:

Create table mrg0 as (
  select * from lrg_bin0 k0
    join lrg_bin1 k1 on k0.id = k1.id
    join lrg_bin2 k2 on k0.id = k2.id
    join lrg_bin3 k3 on k0.id = k3.id
) 

Postgress complains with 'column "id" specified more than once'

The very large number of attributes means I am doing everything programatically so answers suggesting queries I can generate would be appreciated.

share|improve this question

2 Answers 2

up vote 3 down vote accepted

Use using in instead of on

create table mrg0 as 
select *
from
    lrg_bin0 k0
    join lrg_bin1 k1 using (id)
    join lrg_bin2 k2 using (id)
    join lrg_bin3 k3 using (id)

From the manual:

Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both

share|improve this answer
    
Thank you. This does the job perfectly. –  Paul Smith Apr 15 at 14:19

The problem is that in SQL when you perform a join using on, the matching column will be included once per table it's found in. What you can do is specify which table you want the column from ala:

Create table mrg0 as (
  select k1.id, ... from lrg_bin0 k0
    join lrg_bin1 k1 on k0.id = k1.id
    join lrg_bin2 k2 on k0.id = k2.id
    join lrg_bin3 k3 on k0.id = k3.id
) 
share|improve this answer
    
I like your solution because it's explicit vs USING which is implied. –  Bob Apr 15 at 13:32
    
@Bob ??? What is not explicit in using? –  Clodoaldo Neto Apr 15 at 13:58
    
This approach requires that each column is specified individually which is not practical in this context. –  Paul Smith Apr 15 at 14:21
    
I "personally" don't like to see * in my code that's all. With this approach I would have to name out my columns explicitly and alias them in order to keep them from clashing or not include the duplicate id column. When I read the code I would know exactly which columns are being pulled into my target table. –  Bob Apr 15 at 17:12
    
@PaulSmith how isn't this practical? Because you have to name out the id you want from the table you want? –  Bob Apr 15 at 17:19

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.