scikit-learn is a Machine Learning library for Python / NumPy / SciPy
1
vote
0answers
8 views
incorporating emoticons into a scikit model
I'm training an SVM classifier on a text dataset, using scikit. The documentation is good for using a count vectorizer to construct a feature vector using n-grams. E.g, for unigrams and bigrams, I ...
0
votes
0answers
10 views
Results differ whether using a list or a numpy array in scikit-learn
I have a dataset, data, and a labeled array, target, with which I build in scikit-learn a supervised model using the k-Nearest Neighbors algorithm.
neigh = KNeighborsClassifier()
neigh.fit(data, ...
0
votes
1answer
22 views
Sklearn naive bayes classifier for data belonging to the same class
I ran this simple naive bayes program:
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
from sklearn.naive_bayes import ...
1
vote
0answers
17 views
Scikit-learn: role of weights in Ridge Regression
I am using the library scikit-learn to perform Ridge Regression with weights on individual samples. This can be done by: esimator.fit(X, y, sample_weight=some_array). Intuitively, I expect that larger ...
-1
votes
0answers
12 views
Relative influence of features on kmeans in scikit-learn [on hold]
In scikit-learn KMeans, can I determine those features and their relative influence on the cluster separation? In SPSS for example, I can run an ANOVA with the cluster and show an F score that will ...
0
votes
0answers
21 views
Fixing Memory Leak in Django + Scikit-learn
How do you diagnose and fix memory leaks involving Django and Scikit-learn?
I'm working on a Django management command that trains several text classifiers implemented using scikit-learn. I'm using ...
1
vote
0answers
23 views
Issue when building scikit-learn in windows 7
I've installed all dependencies for Scikit learn. But when I run
python setup.py build
or
python setup.py install
commands I get the following error.
...
2
votes
1answer
54 views
How to find the Precision, Recall, Accuracy using SVM?
Duplicate calculating Precision, Recall and F Score
I have a input file with text description and classified level (i.e.levelA and levelB). I want to write a SVM classifier that measure precision, ...
0
votes
1answer
33 views
How to efficiently serialize a scikit-learn classifier
What's the most efficient way to serialize a scikit-learn classifier?
I'm currently using Python's standard Pickle module to serialize a text classifier, but this results in a monstrously large ...
0
votes
0answers
30 views
Lasso Regression with weighted input in Python
I am interested in doing Lasso Regression technique in Python. However, I would like to weight the input data for the algorithm.
Can you suggest some libraries that can perform the Lasso regression ...
0
votes
1answer
25 views
Scikit-Learn SVC hangs on small data set
I'm trying to use scikit-learn to fit a SVM to my data. However, Python hangs on the last line below when I try to fit the data. I let this run for 12 hours before killing it. trainX has 100 ...
0
votes
1answer
66 views
One hot encoder confusion
This is what I have done. I think there is something going on with One hot encoder.
from sklearn.datasets import make_classification
from sklearn.feature_selection import RFE
X, y = ...
1
vote
1answer
56 views
How can i reduce memory usage of Scikit-Learn Vectorizers?
TFIDFVectorizer takes so much memory ,vectorizing 470 MB of 100k documents takes over 6 GB , if we go 21 million documents it will not fit 60 GB of RAM we have.
So we go for HashingVectorizer but ...
1
vote
0answers
22 views
NaN/inf values in scikit-learn manifold learning functions
I have a manifold learning / non-linear dimensionality reduction problem where I know distances between objects up to some threshold, and beyond that I just know that the distance is "far". Also, in ...
1
vote
0answers
19 views
scikit 0.14 multi label metrics
I just installed scikit 0.14 so that I could explore the multi-label metrics improvements. I got some positive results with the hamming loss metrics and the classification report, but was not able to ...