Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I would like to generate a pivot table using py-pandas from this kind of data

id      product  credit
1        book      -5
1        ipad     -15
1      server     -25
2        book      -5
15      server     -25
2       glass      -2
2       glass      -2
1        book      -5
15       glass      -2
1         car    -150

to that sort of spreadsheet

id        1          2        15
---------------------------------
book     -5 (2)     -5(1)     NA
ipad     -15(1)      NA       NA
server   -25(1)      NA      -25(1)
glass     NA        -2(2)    -2(1)
car       -150(1)    NA       NA

That will show the id as columns, product as rows, unit credit and number of of products bought.

Thanks for your help

-H

share|improve this question

1 Answer 1

Main idea is to use pandas...pivot_table().

If you just want to sum your data, then np.sum will do:

>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=np.sum)
id        1   2   15
product             
book     -10  -5 NaN
car     -150 NaN NaN
glass    NaN  -4  -2
ipad     -15 NaN NaN
server   -25 NaN -25

Or you can use collections.Counter to get data in format close to your needs (Counter is not very performant, so be careful about this one):

>>> from collections import Counter
>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=Counter)
id              1        2         15
product                              
book       {-5: 2}  {-5: 1}       NaN
car      {-150: 1}      NaN       NaN
glass          NaN  {-2: 2}   {-2: 1}
ipad      {-15: 1}      NaN       NaN
server    {-25: 1}      NaN  {-25: 1}

Or define custom function to get exactly what you need:

>>> from collections import defaultdict
>>> def hlp_count(x):
...     d = defaultdict(int)
...     for v in x:
...         d[v] += 1
...     # join in case you have more than one distinct price
...     return ', '.join(['{0} ({1})'.format(k, v) for k, v in d.iteritems()])

>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=hlp_count)
id             1       2        15
product                           
book       -5 (2)  -5 (1)      NaN
car      -150 (1)     NaN      NaN
glass         NaN  -2 (2)   -2 (1)
ipad      -15 (1)     NaN      NaN
server    -25 (1)     NaN  -25 (1)
share|improve this answer
    
Thank you ! that's awesome. –  Herve Meftah Dec 6 '13 at 21:49

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.