scikit-learn is a Machine Learning library for Python / NumPy / SciPy

learn more… | top users | synonyms (1)

1
vote
0answers
8 views

incorporating emoticons into a scikit model

I'm training an SVM classifier on a text dataset, using scikit. The documentation is good for using a count vectorizer to construct a feature vector using n-grams. E.g, for unigrams and bigrams, I ...
0
votes
0answers
10 views

Results differ whether using a list or a numpy array in scikit-learn

I have a dataset, data, and a labeled array, target, with which I build in scikit-learn a supervised model using the k-Nearest Neighbors algorithm. neigh = KNeighborsClassifier() neigh.fit(data, ...
0
votes
1answer
22 views

Sklearn naive bayes classifier for data belonging to the same class

I ran this simple naive bayes program: import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) Y = np.array([1, 1, 1, 2, 2, 2]) from sklearn.naive_bayes import ...
1
vote
0answers
17 views

Scikit-learn: role of weights in Ridge Regression

I am using the library scikit-learn to perform Ridge Regression with weights on individual samples. This can be done by: esimator.fit(X, y, sample_weight=some_array). Intuitively, I expect that larger ...
-1
votes
0answers
12 views

Relative influence of features on kmeans in scikit-learn [on hold]

In scikit-learn KMeans, can I determine those features and their relative influence on the cluster separation? In SPSS for example, I can run an ANOVA with the cluster and show an F score that will ...
0
votes
0answers
21 views

Fixing Memory Leak in Django + Scikit-learn

How do you diagnose and fix memory leaks involving Django and Scikit-learn? I'm working on a Django management command that trains several text classifiers implemented using scikit-learn. I'm using ...
1
vote
0answers
23 views

Issue when building scikit-learn in windows 7

I've installed all dependencies for Scikit learn. But when I run python setup.py build or python setup.py install commands I get the following error. ...
2
votes
1answer
54 views

How to find the Precision, Recall, Accuracy using SVM?

Duplicate calculating Precision, Recall and F Score I have a input file with text description and classified level (i.e.levelA and levelB). I want to write a SVM classifier that measure precision, ...
0
votes
1answer
33 views

How to efficiently serialize a scikit-learn classifier

What's the most efficient way to serialize a scikit-learn classifier? I'm currently using Python's standard Pickle module to serialize a text classifier, but this results in a monstrously large ...
0
votes
0answers
30 views

Lasso Regression with weighted input in Python

I am interested in doing Lasso Regression technique in Python. However, I would like to weight the input data for the algorithm. Can you suggest some libraries that can perform the Lasso regression ...
0
votes
1answer
25 views

Scikit-Learn SVC hangs on small data set

I'm trying to use scikit-learn to fit a SVM to my data. However, Python hangs on the last line below when I try to fit the data. I let this run for 12 hours before killing it. trainX has 100 ...
0
votes
1answer
66 views

One hot encoder confusion

This is what I have done. I think there is something going on with One hot encoder. from sklearn.datasets import make_classification from sklearn.feature_selection import RFE X, y = ...
1
vote
1answer
56 views

How can i reduce memory usage of Scikit-Learn Vectorizers?

TFIDFVectorizer takes so much memory ,vectorizing 470 MB of 100k documents takes over 6 GB , if we go 21 million documents it will not fit 60 GB of RAM we have. So we go for HashingVectorizer but ...
1
vote
0answers
22 views

NaN/inf values in scikit-learn manifold learning functions

I have a manifold learning / non-linear dimensionality reduction problem where I know distances between objects up to some threshold, and beyond that I just know that the distance is "far". Also, in ...
1
vote
0answers
19 views

scikit 0.14 multi label metrics

I just installed scikit 0.14 so that I could explore the multi-label metrics improvements. I got some positive results with the hamming loss metrics and the classification report, but was not able to ...

1 2 3 4 5 30
15 30 50 per page