Tagged Questions
1
vote
1answer
35 views
Possible to make a ROC plot from SVM with precomputed kernel in scikit-learn?
I'm using this example for creating ROC plot from SVM classification results: http://scikit-learn.org/0.13/auto_examples/plot_roc.html
However, each data point effectively consists of 4 length-d ...
0
votes
1answer
39 views
Performing Euclidean Distance Measure on Documents (i.e., text) Using Sci-Kit
I'm new to machine learning. After a lot of research, I've decided to use Sci-Kit Learn as much as possible in my efforts. But I'm still at square one.
What I would like to do is perform a ...
2
votes
1answer
55 views
Comparing computer vision libraries in python [closed]
I want to decide about a Python computer vision library. I had used OpenCV in C++, and like it very much. However this time I need to develop my algorithm in Python. My short list has three libraries:
...
0
votes
1answer
28 views
Python Scikit-learn: Empty Vocabulary in TF-IDF
I am using the code given in most up-voted answer to this question (Similarity between two text documents) to calculate TF-IDF between documents. However, I observe that when I run the code WITHOUT ...
0
votes
2answers
38 views
grid search cross-validation on SVC probability output in sci-kit learn
I'd like to run a grid search cross-validation on the probability outputs of the SVC classifier. In particular I'd like to minimize the negative log likelihood. From the documentation it seems that ...
2
votes
1answer
27 views
Different results when using sklearn RandomizedPCA with sparse and dense matrices
I am getting different results when Randomized PCA with sparse and dense matrices:
import numpy as np
import scipy.sparse as scsp
from sklearn.decomposition import RandomizedPCA
x = ...
0
votes
1answer
26 views
Reinitializing learned linear models with scikit-learn
Say I run SGDRegressor or SGDClassifier, and get a set of coefficients that I want to use for the future. It's definitely trivial to do the basic predictions (since, for the regressor, it's just ...
0
votes
1answer
63 views
Train two models concurrently
All I need to do is, train two regression models (using scikit-learn) on the same data at the same time, using different cores. I've tried to figured out by myself using Process without success.
gb1 ...
0
votes
2answers
29 views
Stochastic Gradient Boosting giving unpredictable results
I'm using the Scikit module for Python to implement Stochastic Gradient Boosting. My data set has 2700 instances and 1700 features (x) and contains binary data. My output vector is 'y', and contains 0 ...
1
vote
1answer
49 views
Error when calling scikit-learn using AMD64 build of Scipy on Windows
I am getting this error on this line:
from sklearn.ensemble import RandomForestClassifier
The error log is:
Traceback (most recent call last):
File "C:\workspace\KaggleDigits\KaggleDigits.py", ...
-1
votes
1answer
30 views
Classify collection of images [closed]
I have folder with collection of images from microscope and I have to separate them into two classes (samples with defects and without defects). Additionally I've got sets of already classified ...
0
votes
1answer
20 views
Can DecisionTreeClassifier be used with both binary and multiclass labels simultaneously?
Can I fit the classifier with binary and multiclass labels to predict a result?
Multiclass labels can have more then 2 values, binary labels only can have 2.
Example (the first parameter in X is ...
0
votes
2answers
43 views
get the best features from matrix n X m
I have a X matrix with 1000 features (columns) and 100 lines of float elements and y a vector target of two classes 0 and 1, the dimension of y is (100,1). I want to compute the 10 best features in ...
0
votes
1answer
48 views
Residuals of Random Forest Regression (Python)
When using RandomForestRegressor from Sklearn, how do you get the residuals of the regression? I would like to plot out these residuals to check the linearity.
0
votes
2answers
63 views
Regression with Date variable using Scikit-learn
I have a Pandas DataFrame with a date column (eg: 2013-04-01) of dtype datetime.date. When I include that column in X_train and try to fit the regression model, I get the error float() argument must ...
0
votes
0answers
42 views
Randomized stratified k-fold cross-validation in scikit-learn?
Is there any built-in way to get scikit-learn to perform shuffled stratified k-fold cross-validation? This is on of the most common CV methods, and I am surprised I couldn't find a built-in method to ...
0
votes
1answer
48 views
pyinstaller how to create 64 bit exe on 64 bit window without ImportError: DLL load failed: %1 is not a valid Win32 application
I am trying to create 64 bit exe for 64 bit window 8 using pyinstaller. I have used 64 bit python2.7 and pyqt4, numpy-mkl, scipy, scikit, matplotlib, pywin32, MS visual C++ 2008 all are 64 bit ...
1
vote
1answer
39 views
How to obtain information gain using scikit-learn?
I see that DecisionTreeClassifier accepts criterion='entropy', which means that it must be using information gain as a criterion for splitting the decision tree.
What I need is the information gain ...
0
votes
2answers
65 views
Loop that will create new Pandas.DataFrame column
Following the scikit-learn tutorial here, if we have a Pandas.DataFrame that has a column named colors, how can we create a loop to loop through all of the DataFrame's columns (or a list containing ...
3
votes
2answers
92 views
scikit-learn DBSCAN memory usage
I have a dataset with ~2.5 million samples, each with 35 features (floating point values) that I'm trying to cluster. I've been trying to do this with scikit-learn's implementation of DBSCAN, using ...
1
vote
0answers
45 views
Create and fit a Multiplicative linear regression using Python/Sklearn
I'm using Python 2.7 and Scikit-learn to fit a dataset using multiplicate linear regression, where the different terms are multiplied together instead of added together like in ...
0
votes
1answer
26 views
What does Grid_scores_ mean in Scikit-learn's GridSearchCV
After performing a grid search with sklearn.grid_search.GridSearchCV() on a linear_model.Ridge to find a suitable alpha, we can get the grid scores using clf.grid_scores_.
What do the numbers in the ...
1
vote
2answers
87 views
Python MemoryError when doing fitting with Scikit-learn
I am running Python 2.7 (64-bit) on a Windows 8 64-bit system with 24GB memory. When doing the fitting of the usual Sklearn.linear_models.Ridge, the code runs fine.
Problem: However when using ...
1
vote
1answer
76 views
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
I am trying to calculate the Mean Squared Error of the predictions y_train_actual from my sci-kit learn model with the original values salaries.
Problem: However with ...
2
votes
2answers
85 views
Random Forest interpretation in scikit-learn
I am using sklearn.ensemble.RandomForestRegressor to fit a random forest regressor on a dataset. Now, that I have the results, is it possible to interpret this in some format where I can then ...
2
votes
2answers
70 views
How many features can scikit-learn handle?
I have a csv file of [66k, 56k] size (rows, columns). Its a sparse matrix. I know that numpy can handle that size a matrix. I would like to know based on everyone's experience, how many features ...
0
votes
1answer
34 views
Enable Python to utilize all cores for fitting scikit-learn models
I'm running python 2.7 with ipython on Windows 8 64bit with a system that has 4 cores. When fitting a scikit-learn model, the CPU usage is 50%, 25% from python and 25% from Chrome.
Why is chrome ...
-1
votes
2answers
51 views
Why is python's hstack used here for machine learning
I am trying to understand some python code that tries to predict prices based on an ad posting.
Before fitting the text vectorizers, a function does hstack((des, titles)) to the ad description des ...
0
votes
1answer
35 views
Find the Most common term in Scikit-learn classifier [duplicate]
I'm following the example in Scikit learn docs where CountVectorizer is used on some dataset.
Question: count_vect.vocabulary_.viewitems() lists all the terms and their frequencies. How do you sort ...
1
vote
1answer
42 views
Restrictions in terms of using external libraries (Python) in a Storm Bolt
I want to implement a Bolt (https://github.com/nathanmarz/storm) that does some heavy processing on a tuples using scikit Machine Learning API (http://scikit-learn.org/)
For example -
from sklearn ...