Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I had a pandas dataframe that had columns with strings from 0-9 as column names:

working_df = pd.DataFrame(np.random.rand(5,10),index=range(0,5), columns=[str(x) for x in range(10)])
working_df.loc[:,'outcome'] = [0,1,1,0,1]

I then wanted to get an array of all of these numbers into one column so I did:

array_list = [Y for Y in x[[str(num) for num in range(10)]].values]

which gave me:

[array([ 0.0793451 ,  0.3288617 ,  0.75887129,  0.01128641,  0.64105905,
         0.78789297,  0.69673768,  0.20354558,  0.48976411,  0.72848541]),
 array([ 0.53511388,  0.08896322,  0.10302786,  0.08008444,  0.18218731,
         0.2342337 ,  0.52622153,  0.65607384,  0.86069294,  0.8864577 ]),
 array([ 0.82878026,  0.33986175,  0.25707122,  0.96525733,  0.5897311 ,
         0.3884232 ,  0.10943644,  0.26944414,  0.85491211,  0.15801284]),
 array([ 0.31818888,  0.0525836 ,  0.49150727,  0.53682492,  0.78692193,
         0.97945708,  0.53181293,  0.74330327,  0.91364064,  0.49085287]),
 array([ 0.14909577,  0.33959452,  0.20607263,  0.78789116,  0.41780657,
         0.0437907 ,  0.67697385,  0.98579928,  0.1487507 ,  0.41682309])]

I then attached it to my dataframe using:

working_df.loc[:,'array_list'] = pd.Series(array_list)

I then setup my rf_clf = RandomForestClassifier() and I try to rf_clf.fit(working_df['array_list'][1:].values, working_df['outcome'][1:].values) which results in the ValueError: setting an array element with sequence

Is it a problem with the array of arrays in the fitting? Thanks for any insight.

share|improve this question
    
Please could you show the full error traceback in your question so that we can see where exactly the exception is being raised – ali_m Oct 21 '15 at 20:50
up vote 1 down vote accepted

The problem is that scikit-learn expects a two-dimensional array of values as input. You're passing a one dimensional array of objects (with each object itself being a one-dimensional array).

A quick fix would be to do this:

X = np.array(list(working_df['array_list'][1:]))
y = working_df['outcome'][1:].values
rf_clf.fit(X, y)

A better fix would be to not store your two-dimensional feature array within a one-dimensional pandas column.

share|improve this answer
    
thanks! your videos are what got me started on scikit-learn.. thanks for the tip – nahata5 Oct 22 '15 at 0:42

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.