Scikit-learn ValueError when implementing logistic regression in Python

Question

I am new to machine learning and am trying to set up a logistic regression for prediction purposes in Python using scikit-learn. I already set one up with a small, mock dataset, but when expanding this code to work for larger datasets, I run into an issue regarding a ValueError. Here is my code:

inputData = np.genfromtxt(file, skip_header=1, unpack=True)
print "X array shape: ",inputData.shape 
inputAnswers = np.genfromtxt(file2, skip_header=1, unpack=True)
print "Y array shape: ",inputAnswers.shape

logreg = LogisticRegression(penalty='l2',C=2.0)
logreg.fit(inputData, inputAnswers)

The inputData 2D array (matrix) has 149 rows and 231 columns. I'm trying to fit it to the inputAnswers array, which has 149 rows, correctly corresponding to the 149 rows of the inputData array. However, here is the output I receive:

X array shape:  (231, 149)
Y array shape:  (149,)
Traceback (most recent call last):
File "LogRegTry_rawData.py", line 26, in <module>
logreg.fit(inputData, inputAnswers)
File "[path]", line 676, in fit
(X.shape[0], y.shape[0]))
ValueError: X and y have incompatible shapes.
X has 231 samples, but y has 149.

I understand what the error means, but I'm not sure of both why it is showing up in this situation and how to fix it. Any help is greatly appreciated. Thank you!

ojy · Accepted Answer · 2014-07-26 00:32:57Z

up vote 1 down vote accepted

In shape, the first element is the number of rows, and the second - the number of columns. So you have 231 entries, and only 149 labels. Try transposing your data: inputData.T

answered Jul 26 '14 at 0:32

ojy
1,060213

thank you! I used the np.transpose() function, and this worked. I wonder why np.genfromtxt reads it "inverted," however... – user3847447 Jul 26 '14 at 0:44

unpack=True is transposing the data – ojy Jul 26 '14 at 0:49

add a comment |

asked	11 months ago
viewed	121 times
active	11 months ago

current community

your communities

more stack exchange communities

Scikit-learn ValueError when implementing logistic regression in Python

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays scikit-learn prediction logistic-regression or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Scikit-learn ValueError when implementing logistic regression in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays scikit-learn prediction logistic-regression or ask your own question.

Related

Hot Network Questions