Working with set_index in Pandas DataFrame

Question

Using an imported CSV file, I indexed the DataFrame like this...

 rdata.set_index(['race_date', 'track_code', 'race_number', 'horse_name'])

This is what a section of the DataFrame looks like...

 race_date  track_code race_number horse_name          work_date  work_track
 2007-08-24 BM         8           Count Me Twice     2007-05-31         PLN
                                   Count Me Twice     2007-06-09         PLN
                                   Count Me Twice     2007-06-16         PLN
                                   Count Me Twice     2007-06-23         PLN
                                   Count Me Twice     2007-08-05         PLN
                                   Judge's Choice     2007-06-07          BM
                                   Judge's Choice     2007-06-14          BM
                                   Judge's Choice     2007-07-08          BM
                                   Judge's Choice     2007-08-18          BM

Why isn't the 'horse_name' column being grouped like the date, track and race? Perhaps it's by design, thus how can I slice this larger DataFrame by race to have a new DataFrame with 'horse_name' as its index?

looks like a bug, the correct place for bug reports is on github :) Good find! — Andy Hayden, Aug 6 '13 at 3:23
This question appears to be off-topic because it is a bug report. — Andy Hayden, Aug 6 '13 at 5:22

Viktor Kerkez · Accepted Answer · 2013-08-06 23:17:22Z

It's not a bug. This is exactly how it's intended to work.

DataFrame has to show show every single item in it's data. So if the index has one level, that level will be fully expanded. If it has two levels, first level will be grouped and the second will be fully expanded, if it has tree levels, first two will be grouped and the third will be expanded, and so on.

So this is why the horse name is not grouped. How would you be able to see all the items in the DataFrame if you group also by the horse name :)

Try doing:

 rdata.set_index(['race_date', 'track_code', 'race_number'])

or:

 rdata.set_index(['race_date', 'track_code'])

You'll see that the last level of the index is always fully expanded, to enable you to see all the items in the DataFrame.

asked	1 year ago
viewed	2341 times
active	1 year ago

current community

your communities

more stack exchange communities

Working with set_index in Pandas DataFrame

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python pandas or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Working with set_index in Pandas DataFrame

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python pandas or ask your own question.

Related

Hot Network Questions