Classification test in Scikit-learn, ValueError: setting an array element with a sequence

Question

Using the tutorial on multiclass adaboost, I'm trying to classify some images that have two classes (but I don't suppose the algorithm shouldn't work if the problem is binary). Then I'm going to extend my samples to include other classes.

My current test is quite small, only 17 images in all, 10 for training, 7 for testing.

For now I have two classes: 0: no vehicle, 1: vehicle present I used integer labels because according to the example in the link above, the training data consists of integer-based labels.

I've edited the provided example only a bit, to include my own image files, but I'm getting an error.

Traceback (most recent call last):
  File "C:\Users\app\Documents\Python Scripts\carclassify.py", line 66, in <module>
    bdt_discrete.fit(X_train, y_train)
  File "C:\Users\app\Anaconda\lib\site-packages\sklearn\ensemble\weight_boosting.py", line 389, in fit
    return super(AdaBoostClassifier, self).fit(X, y, sample_weight)
  File "C:\Users\app\Anaconda\lib\site-packages\sklearn\ensemble\weight_boosting.py", line 99, in fit
    X = np.ascontiguousarray(array2d(X), dtype=DTYPE)
  File "C:\Users\app\Anaconda\lib\site-packages\numpy\core\numeric.py", line 408, in ascontiguousarray
    return array(a, dtype, copy=False, order='C', ndmin=1)
ValueError: setting an array element with a sequence.

The following is my code, adapted from the example on the scikit-learn website:

f = open("PATH_TO_SAMPLES\\samples.txt",'r')
out = f.read().splitlines()
import numpy as np

imgs = []
tmp_hogs = []
# 13 of the images are with vehicles, 4 are without
labels = [1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0]

for file in out:
        filepath = "C:\PATH_TO_SAMPLE_IMAGES\\" + file
        curr_img = color.rgb2gray(io.imread(filepath))
        imgs.append(resize(curr_img,(60,40)))
        fd, hog_image = hog(curr_img, orientations=8, pixels_per_cell=(16, 16),
                 cells_per_block=(1, 1), visualise=True)
        tmp_hogs.append(fd) 

img_hogs = np.array(tmp_hogs)
n_split = 10
X_train, X_test = img_hogs[:n_split], X[n_split:] # all first ten images with vehicles
y_train, y_test = labels[:n_split], labels[n_split:] # 3 images with vehicles, 4 without

#now all the code below is straight off the example on scikit-learn's website

bdt_real = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=2),
    n_estimators=600,
    learning_rate=1)

bdt_discrete = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=2),
    n_estimators=600,
    learning_rate=1.5,
    algorithm="SAMME")

bdt_real.fit(X_train, y_train)
bdt_discrete.fit(X_train, y_train)

real_test_errors = []
discrete_test_errors = []

for real_test_predict, discrete_train_predict in zip(
        bdt_real.staged_predict(X_test), bdt_discrete.staged_predict(X_test)):
    real_test_errors.append(
        1. - accuracy_score(real_test_predict, y_test))
    discrete_test_errors.append(
        1. - accuracy_score(discrete_train_predict, y_test))

n_trees = xrange(1, len(bdt_discrete) + 1)

pl.figure(figsize=(15, 5))

pl.subplot(131)
pl.plot(n_trees, discrete_test_errors, c='black', label='SAMME')
pl.plot(n_trees, real_test_errors, c='black',
        linestyle='dashed', label='SAMME.R')
pl.legend()
pl.ylim(0.18, 0.62)
pl.ylabel('Test Error')
pl.xlabel('Number of Trees')

pl.subplot(132)
pl.plot(n_trees, bdt_discrete.estimator_errors_, "b", label='SAMME', alpha=.5)
pl.plot(n_trees, bdt_real.estimator_errors_, "r", label='SAMME.R', alpha=.5)
pl.legend()
pl.ylabel('Error')
pl.xlabel('Number of Trees')
pl.ylim((.2,
        max(bdt_real.estimator_errors_.max(),
            bdt_discrete.estimator_errors_.max()) * 1.2))
pl.xlim((-20, len(bdt_discrete) + 20))

pl.subplot(133)
pl.plot(n_trees, bdt_discrete.estimator_weights_, "b", label='SAMME')
pl.legend()
pl.ylabel('Weight')
pl.xlabel('Number of Trees')
pl.ylim((0, bdt_discrete.estimator_weights_.max() * 1.2))
pl.xlim((-20, len(bdt_discrete) + 20))

# prevent overlapping y-axis labels
pl.subplots_adjust(wspace=0.25)
pl.show()

Edit

I typed

print tmp_hogs

and the output was this:

[ array([ 0.27621208,  0.11038658,  0.10698133, ...,  0.08661556,        0.04612063,  0.0280782 ]), 
        array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ..., -1.29909838e-15,  -7.01780982e-17,  -1.24900943e-15]), 
        array([ 0.0503603 ,  0.1497235 ,  0.2372957 , ...,  0.07249325, 0.04545541,  0.00903818]), 
        array([ 0.27299191,  0.13122109,  0.0719268 , ...,  0.0848522 ,  0.04789403,  0.01387038]), 
        array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,  3.32140617e-17,  -6.58924128e-17,  -6.23567224e-16]), 
        array([ 0.37431874,  0.18094303,  0.01219871, ...,  0.06501856, 0.04855516,  0.02439321]), 
        array([ 0.41087302,  0.16478851,  0.03396399, ...,  0.09511273, 0.04077713,  0.03945513]), 
        array([ 0.17753915,  0.07025565,  0.09136909, ...,  0.03396507, 0.01379266,  0.01645722]), 
        array([ 0.40605587,  0.05915388,  0.03767763, ...,  0.08981079, 0.05452031,  0.01725399]), 
        array([ 0.        ,  0.        ,  0.        , ...,  0.00579303, 0.02053979,  0.0019091 ]), 
        array([ 0.31550735,  0.11988131,  0.07716529, ...,  0.09815158, 0.03058497,  0.02236517]), 
        array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ..., -3.51175682e-16,   1.31619418e-03,   2.86127901e-16]), 
        array([ 0.21381704,  0.22352378,  0.11568828, ...,  0.06311083, 0.02696666,  0.00402261]), 
        array([ 0.17480064,  0.1469145 ,  0.16336016, ...,  0.05614001, 0.03244093,  0.00524034]), 
        array([ 0.        ,  0.        ,  0.        , ...,  0.03089959, 0.00509584,  0.00247698]), 
        array([ 0.04711166,  0.0218663 ,  0.05316   , ...,  0.04214594, 0.04892439,  0.25840958]), 
        array([ 0.05357464,  0.00530857,  0.07162301, ...,  0.06802692, 0.08331959,  0.26619977])]

Then I ran

print img_hogs

and the output was:

[ array([ 0.27621208,  0.11038658,  0.10698133, ...,  0.08661556, 0.04612063,  0.0280782 ])
 array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ..., -1.29909838e-15,  -7.01780982e-17,  -1.24900943e-15])
 array([ 0.0503603 ,  0.1497235 ,  0.2372957 , ...,  0.07249325, 0.04545541,  0.00903818])
 array([ 0.27299191,  0.13122109,  0.0719268 , ...,  0.0848522 , 0.04789403,  0.01387038])
 array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ..., 3.32140617e-17,  -6.58924128e-17,  -6.23567224e-16])
 array([ 0.37431874,  0.18094303,  0.01219871, ...,  0.06501856, 0.04855516,  0.02439321])
 array([ 0.41087302,  0.16478851,  0.03396399, ...,  0.09511273, 0.04077713,  0.03945513])
 array([ 0.17753915,  0.07025565,  0.09136909, ...,  0.03396507, 0.01379266,  0.01645722])
 array([ 0.40605587,  0.05915388,  0.03767763, ...,  0.08981079, 0.05452031,  0.01725399])
 array([ 0.        ,  0.        ,  0.        , ...,  0.00579303, 0.02053979,  0.0019091 ])
 array([ 0.31550735,  0.11988131,  0.07716529, ...,  0.09815158, 0.03058497,  0.02236517])
 array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ..., -3.51175682e-16,   1.31619418e-03,   2.86127901e-16])
 array([ 0.21381704,  0.22352378,  0.11568828, ...,  0.06311083, 0.02696666,  0.00402261])
 array([ 0.17480064,  0.1469145 ,  0.16336016, ...,  0.05614001, 0.03244093,  0.00524034])
 array([ 0.        ,  0.        ,  0.        , ...,  0.03089959, 0.00509584,  0.00247698])
 array([ 0.04711166,  0.0218663 ,  0.05316   , ...,  0.04214594, 0.04892439,  0.25840958])
 array([ 0.05357464,  0.00530857,  0.07162301, ...,  0.06802692, 0.08331959,  0.26619977])]

Quite independently of the error: 17 samples is decidedly not enough to do anything meaningful. Why don't you download a standard image database and try it on that? An easy, well organized one is Caltech101. — eickenberg, Apr 11 '14 at 11:02
Could you show what tmp_hogs and what img_hogs looks like? — eickenberg, Apr 11 '14 at 12:52
Sure! I've edited the question to include the outputs at the end. — user961627, Apr 11 '14 at 13:21
The second output doesn't look right (it is a copy of the first). It should say array([... ... ... ...], dtype= ...) — eickenberg, Apr 11 '14 at 13:52
I just tried img_hogs = np.array(tmp_hogs, dtype =float), but it gave the same error, and on this line in fact. — user961627, Apr 11 '14 at 14:25

Abhishek Thakur · Accepted Answer · 2014-04-11 14:26:22Z

up vote 1 down vote accepted

try:

imgs = []
tmp_hogs = np.zeros((17, 256))
# 13 of the images are with vehicles, 4 are without
labels = [1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0]

i = 0
for file in out:
        filepath = "C:\PATH_TO_SAMPLE_IMAGES\\" + file
        curr_img = color.rgb2gray(io.imread(filepath))
        imgs.append(resize(curr_img,(60,40)))
        fd, hog_image = hog(curr_img, orientations=8, pixels_per_cell=(16, 16),
                 cells_per_block=(1, 1), visualise=True)
        tmp_hogs[i,:] = fd
        i+=1

img_hogs = tmp_hogs

answered Apr 11 '14 at 14:26

Abhishek Thakur
2,89911734

Just tried this, but on the line tmp_hogs[i,:]=fd, I get the following error: ValueError: could not broadcast input array from shape (1728) into shape (256). So I adjusted the tmp_hogs declaration and gave it 1728 columns, then I got the same error again, this time saying it couldn't "broadcat from 1728 to 2080". So I'm guessing this means that my tmp_hogs doesn't have the same number of columns in each row? But I resized all images to 60,40! So how could this be? – user961627 Apr 11 '14 at 14:35

limit the number of hog features. its not consistent – Abhishek Thakur Apr 11 '14 at 14:45

1

How so? I thought resizing the image to the same size would limit this. – user961627 Apr 11 '14 at 15:28

add a comment |

asked	1 year ago
viewed	632 times
active	1 year ago

current community

your communities

more stack exchange communities

Classification test in Scikit-learn, ValueError: setting an array element with a sequence

Edit

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python numpy scikit-learn or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Classification test in Scikit-learn, ValueError: setting an array element with a sequence

Edit

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python numpy scikit-learn or ask your own question.

Linked

Related

Hot Network Questions