Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

By grouping two columns I made some changes.

I generated a file using python, it resulted in 2 duplicate columns. How to remove duplicate columns from a dataframe?

share|improve this question
    
Do they have same column name? –  waitingkuo Jun 5 '13 at 11:38
add comment

1 Answer

It's probably easiest to use a groupby (assuming they have duplicate names too):

In [11]: df
Out[11]:
   A  B  B
0  a  4  4
1  b  4  4
2  c  4  4

In [12]: df.T.groupby(level=0).first().T
Out[12]:
   A  B
0  a  4
1  b  4
2  c  4

If they have different names you can drop_duplicates on the transpose:

In [21]: df
Out[21]:
   A  B  C
0  a  4  4
1  b  4  4
2  c  4  4

In [22]: df.T.drop_duplicates().T
Out[22]:
   A  B
0  a  4
1  b  4
2  c  4

Usually read_csv will usually ensure they have different names...

share|improve this answer
    
FYI @Andy, there is a new option in 0.11.1 that controls this mangle_dup_cols; default is TO mangle (e.g. produce unique cols), in 0.12, this will change to leave dups in place –  Jeff Jun 5 '13 at 12:19
    
@Jeff ah! Thanks for the update :) good feature! –  Andy Hayden Jun 5 '13 at 12:21
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.