33
votes
1answer
765 views

Fitting a scikits.learn.hmm.GaussianHMM to variable length training sequences

I'd like to fit a scikits.learn.hmm.GaussianHMM to training sequences of different length. The fit method, however, prevents using sequences of different length by doing obs = np.asanyarray(obs) ...
9
votes
6answers
3k views

fastest SVM implementation usable in python

I'm building some predictive models in python and have been using scikits learn's SVM implementation. It's been really great, easy to use, and relatively fast. Unfortunately, I'm beginning to become ...
9
votes
2answers
1k views

Save NaiveBayes classifier to disk in Scikits learn

How do I save a trained Naive Bayes classifier to disk and use to predict data? I have the following sample program from Scikits learn website: from sklearn import datasets iris = ...
9
votes
4answers
3k views

Is it possible to specify your own distance function using Scikits.Learn K-Means Clustering?

Is it possible to specify your own distance function using Scikits.Learn K-Means Clustering? If so, how and where?
8
votes
2answers
858 views

Best Machine Learning package for Python 3x?

I was bummed out to see that scikit-learn does not support Python 3...Is there a comparable package anyone can recommend for Python 3?
7
votes
1answer
420 views

Distinguishing overfitting vs good prediction

These are questions on how to calculate & reduce overfitting in machine learning. I think many new to machine learning will have the same questions, so I tried to be clear with my examples and ...
7
votes
2answers
1k views

Python List of Ngrams with frequencies

I need to get most popular ngrams from text. Ngrams length must be from 1 to 5 words. I know how to get bigrams and trigrams. For example: bigram_measures = nltk.collocations.BigramAssocMeasures() ...
7
votes
1answer
346 views

In scikit learn, how to deal with the data mixed with numerical and nominal value?

I know that the computation in scikit is based on numpy, so everything is matrix or array. But I don't know how the package dealing with data mixed with numerical and nominal value. For example, a ...
7
votes
3answers
404 views

Why are LASSO in sklearn (python) and matlab statistical package different?

I am using LaasoCV from sklearn to select the best model is selected by cross-validation. I found that the cross validation gives different result if I use sklearn or matlab statistical toolbox. I ...
6
votes
1answer
679 views

How to extract info from scikits.learn classifier to then use in C code

I have trained a bunch of RBF SVMs using scikits.learn in Python and then Pickled the results. These are for image processing tasks and one thing I want to do for testing is run each classifier on ...
6
votes
1answer
237 views

Counting with scipy.sparse

I am using the Python sklearn libraries. I have 150,000+ sentences. I need an array-like object, where each row is for a sentences, each column corresponds to a word, and each element is the number ...
6
votes
2answers
154 views

Keep pandas structure with numpy/scikit functions

I'm using the excellent read_csv()function from pandas, which gives: In [31]: data = pandas.read_csv("lala.csv", delimiter=",") In [32]: data Out[32]: <class 'pandas.core.frame.DataFrame'> ...
5
votes
2answers
557 views

How to get most informative features for scikit-learn classifiers?

The classifiers in machine learning packages like liblinear and nltk offer a method show_most_informative_features(), which is really helpful for debugging features: viagra = None ok : spam ...
5
votes
1answer
851 views

use scikit-learn to classify into multiple categories

Im trying to use on of scikit-learn's supervised learning methods to classify pieces of text into one or more categories. The predict function of all the algorithms i tried just returns one match. ...
5
votes
1answer
316 views

Scaling data in scikit-learn SVM

While libsvm provides tools for scaling data, with Scikit-Learn (which should be based upon libSVM for the SVC classifier) I find no way to scale my data. Basically I want to use 4 features, of ...

1 2 3 4 5 16
15 30 50 per page