Results differ whether using a list or a numpy array in scikit-learn

Question

I have a dataset, data, and a labeled array, target, with which I build in scikit-learn a supervised model using the k-Nearest Neighbors algorithm.

neigh = KNeighborsClassifier()
neigh.fit(data, target)

I am now able to classify my learning set using this very model. To get the classification score :

neigh.score(data, target)

Now my problem is that this score depends on the type of the target object.

If it is a python list, that is, created using list() and filled in with target.append(), the score method returns 0.68.
If it is a numpy array, created using target = np.empty(shape=(length,1), dtype="S36") (it contains only 36-character strings), and filled in with target[k] = value, the score method returns 0.008.

To make sure whether results were really different or not, I created text files that list the results of

for k in data:
    neigh.predict(k)

in each case. The results were the same.

What can explain the score difference ?

What happens if you specify the array's shape as (length) only? So that its shape will be (length,) and not (length, 1)? — Harel, Jul 16 '13 at 10:56
@Harel, Thank you! That solved the problem. But I don't really understand why, how did you think about it ? — cardboard, Jul 16 '13 at 11:35
Nothing like experience... there's a difference in numpy between two-dimensional arrays with one dimension size = 1 and one-dimensional arrays. While they are often used interchangeably, they are not identical, and in this case this slight difference produced a problem. Is it ethical to ask for an upvote on my original comment? :) — Harel, Jul 17 '13 at 12:29
@Harel, would love to, but not enough reputation to upvote..! — cardboard, Jul 18 '13 at 14:24

larsmans · Accepted Answer · 2013-07-16 19:07:56Z

@Harel spotted the problem, here's the explanation:

np.empty(shape=(length, 1), dtype="S36")

creates an array of the wrong shape. scikit-learn estimators almost invariably want 1-d arrays, i.e. shape=length. The fact that this doesn't raise an exception is an oversight.

asked	5 months ago
viewed	84 times
active	5 months ago

Explore our sites

Results differ whether using a list or a numpy array in scikit-learn

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python numpy classification scikit-learn or ask your own question.

Hot Network Questions

Explore our sites

Results differ whether using a list or a numpy array in scikit-learn

1 Answer

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged python numpy classification scikit-learn or ask your own question.

Related

Hot Network Questions