Tagged Questions
2
votes
0answers
27 views
Running R caret in Python script [on hold]
I am currently trying to transition from R into Python to streamline the process of working in web-based applications. I know that SciKit Learn has a ton of functionalities parallel to R's, including ...
0
votes
1answer
39 views
Scipy installation, its installed why the traceback?
I am running the following code (this is the beginning snippet):
# Import necessary modules and functions
########################################
# time module, for timing it.
from time import time
...
0
votes
0answers
25 views
Give weights to features [on hold]
I want to perform text classification on tweets, I want to give more importance to the hash tags than the other words, how exactly do I do that? I'm using scikit and it has an option to show the ...
0
votes
1answer
14 views
Ensemble learning with 2 classifiers
I'm trying to combine 2 approaches to classifying my data, one comes from a SVM and another external classifier that gives out one or more labels as to what it thinks the observation point is. Is it ...
0
votes
0answers
35 views
Time series forecasting with support vector regression
I'm trying to perform a simple time series prediction using support vector regression.
I am trying to understand the answer provided here.
I adapted Tom's code to reflect the answer provided:
...
0
votes
1answer
30 views
Online version of scikit-learn's TfidfVectorizer
I'm looking to use scikit-learn's HashingVectorizer because it's a great fit for online learning problems (new tokens in text are guaranteed to map to a "bucket"). Unfortunately the implementation ...
1
vote
1answer
15 views
Python scikit-learn: Cannot clone object… as the constructor does not seem to set parameter
I modified the BernoulliRBM class of scikit-learn to use groups of softmax visible units. In the process, I added an extra Numpy array visible_config as a class attribute which is initialized in the ...
3
votes
2answers
72 views
Quickest linear regression implementation in python
I'm performing a stepwise model selection, progressively dropping variables with a variance inflation factor over a certain threshold.
In order to do this, I'm running OLS many, many times on ...
0
votes
2answers
23 views
Error Converting Sparse Matrix to Array with scipy.sparse.csc_matrix.toarray()
I have a scipy.sparse.csc_matrix that I am trying to transform into an array with scipy.sparse.csc_matrix.toarray(). When I use the function for a small dataset it works fine. However, when I use it ...
1
vote
2answers
21 views
What kind of input is required for sklearn's SVC's scoring methods?
so I am trying to build a classifier and score its performance. This is my code:
def svc(train_data, train_labels, test_data, test_labels):
from sklearn.svm import SVC
from sklearn.metrics ...
3
votes
1answer
37 views
Correlating n number of coordinate points
I have a set of coordinates that I get from identifying some sources in an image and I have another set of coordinates from a text file catalog that correspond to the sources in the image. I wanted to ...
0
votes
1answer
32 views
sklearn.cross_validation.cross_val_score multiple cpu?
I am trying to get a score for a model through cross validation with sklearn.cross_validation.cross_val_score. According to its documentation, the parameter n_jobs sets the number of cpus that you can ...
-2
votes
1answer
31 views
runtime error when using sklearn in python3.2 (works fine in python 2.7) - how to fix it?
I am trying to learn sklearn and I encounter the below error when I run import sklearn . However, when I run the exact same code using python 2.7, I do not encounter any errors.
import sklearn
...
2
votes
0answers
31 views
scikit-learn svm module and predict function not working
I am trying to get an SVM to work using scikit-learn but cannot get the results I am expecting. I would like to use k-means to classify roughly 2-5 data clusters and then use an SVM to build a model ...
0
votes
0answers
33 views
How to sort python csr_matix by data
I want to get keywords of a text by tfidf method with sklenrn
I have got tfidf module, see code below:
from sklearn.feature_extraction import text
tfidf_vect = text.TfidfVectorizer()
texts = ...
0
votes
1answer
37 views
Label encoding across multiple columns in scikit-learn
I'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; ...
-4
votes
0answers
37 views
Data manipulation and scripting in scikit-learn [closed]
[Couldn't find answer on Scikit-learn website]
1) Is there any documents available giving a general knowledge about how to manipulate dataset in Scikit-learn?
Example questions like: How to select ...
2
votes
0answers
21 views
Creating a sklearn.linear_model.LogisticRegression instance from existing coefficients
Can one create such an instance based on existing coefficients which were calculated say in a different implementation (e.g. Java)?
I tried creating an instance then setting coef_ and intercept_ ...
0
votes
1answer
28 views
Variable time steps in observations fed into hidden markov model
I'm guessing this is not possible, at least with the standard HMM implementation of scikit-learn, but I wanted to ask this question to see if there are any other approaches to this issue.
The problem ...
0
votes
0answers
29 views
how to make an effective use of a tutorial? [closed]
I try to learn programming (Python, scikit-learn), and I found that most introductions have the tutorial format: text with pieces of codes.
I found that reading these pages does not allow to ...
1
vote
1answer
28 views
sklearn.linear_model.LogisticRegression returns different coefficients every time although random_state is set
I'm fitting a logistic regression model and am setting the random state to a fixed value.
Every time I do a "fit" I get different coefficients, example:
...
0
votes
0answers
32 views
Optimising accuracy for OneClassSVM
I have a problem which requires the use of a one class classification system. I am currently using python for development and I am using sci-kit learn for machine learning tasks as a result.
From ...
0
votes
0answers
18 views
scikit read tree from dot file
I'm using Scikit to generate a decision tree, which I then save to a file using export_graphviz. Here's the code:
# Generate the table
X,Y = self.generate_table()
# Generate the tree
...
0
votes
0answers
35 views
SVR keeps predicting flat line
I was trying to implement the support vector regression on predicting return in the future by 'feeding' it with returns of last five days. Here's a link for the idea: ...
1
vote
1answer
78 views
scikit-learn joblib bug: multiprocessing pool self.value out of range for 'i' format code, only with large numpy arrays
My code runs fine with smaller test samples, like 10000 rows of data in X_train, y_train. When I call it for millions of rows, I get the resulting error. Is the bug in a package, or can I do something ...
0
votes
1answer
24 views
How to plot text documents in a scatter map?
I'm using scikit to perform text classification and I'm trying to understand where the points lie with respect to my hyperplane to decide how to proceed. But I can't seem to plot the data that comes ...
1
vote
1answer
24 views
Python: Sklearn.linear_model.LinearRegression working weird
I am trying to do multiple variables linear regression. But I find that the sklearn.linear_model working very weird. Here's my code:
import numpy as np
from sklearn import linear_model
b = ...
0
votes
1answer
23 views
Adding words to scikit-learn's CountVectorizer's stop list
Scikit-learn's CountVectorizer class lets you pass a string 'english' to the argument stop_words. I want to add some things to this predefined list. Can anyone tell me how to do this?
0
votes
1answer
29 views
Implement k-means clustering, accelerated using the triangle inequality, in Python (Scikit learn)
I am attempting to run k-means clustering on a large dataset (9106 items, 100 dimensions). This makes it very slow so I have been recommended to use the triangle inequality as described by Charles ...
1
vote
1answer
29 views
Understanding format of data in scikit-learn
I am trying to work with multi-label text classification using scikit-learn in Python 3.x. I have data in libsvm format which I am loading using load_svmlight_file module. The data format is like ...
0
votes
1answer
21 views
Python sci-kit learn (metrics): difference between r2_score and explained_variance_score?
I noticed that that 'r2_score' and 'explained_variance_score' are both build-in sklearn.metrics methods for regression problems.
I was always under the impression that r2_score is the percent ...
0
votes
1answer
19 views
how to print estimated coefficients after a (GridSearchCV) fit a model? (SGDRegressor)
I am new to scikit-learn, but it did what I was hoping for. Now, maddeningly, the only remaining issue is that I don't find how I could print (or even better, write to a small text file) all the ...
0
votes
1answer
15 views
where to put freeze_support() in a Python script?
I am confused about using freeze_support() for multiprocessing and I get an Runtime Error without it. I am only running a script, not defining a function or a module. Can I still use it? Or the ...
3
votes
0answers
30 views
multiprocessing.Pool hangs if child causes a segmentation fault
I want to apply a function in parallel using multiprocessing.Pool.
The problem is that if one function call triggers a segmentation fault the Pool hangs forever.
Has anybody an idea how I can make a ...
3
votes
2answers
109 views
+500
Naive Bayes: Imbalanced Test Dataset
I am using scikit-learn Multinomial Naive Bayes classifier for binary text classification (classifier tells me whether the document belongs to the category X or not). I use a balanced dataset to train ...
0
votes
1answer
39 views
freeze_support bug in using scikit-learn in the Anaconda python distro?
I just want to be sure this is not about my code but it needs to be fixed in the relevant Python package. (By the way, does this look like something I can manually patch even before the vendor ships ...
0
votes
0answers
28 views
Sklearn tf-idf fit_transform
I am trying to use sklearn's tf-idf.
import nltk
import string
import os
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.porter import PorterStemmer
.....
.....
...
0
votes
1answer
62 views
What is “The sum of true positives and false positives are equal to zero for some labels.” mean?
I'm using scikit learn to perform cross validation using StratifiedKFold to compute the f1 score, but it says that some of my labels have the sum of true positives and false positives are equal to ...
-2
votes
0answers
19 views
Make linear regression equation for every node (leaf) of regression tree [closed]
I would like to know, if there is already some tool (module) in Python to do linear regression equation for every leaf of regression tree? I would like to do something like here How do I make a ...
0
votes
1answer
34 views
How to pass attributes between classes (a cloning issue)
I have the following code that works fine as it is:
class Classifiers(object):
"""Multiple classifiers"""
class SVM():
"""
SVM Classifier Object.
This is binary ...
1
vote
1answer
21 views
Using img_to_graph from Sklearn
I have the following code:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from scipy import ndimage
from sklearn.feature_extraction import image
from sklearn.cluster import ...
2
votes
1answer
52 views
How do you estimate the performance of a classifier on test data?
I'm using scikit to make a supervised classifier and I am currently tuning it to give me good accuracy on the labeled data. But how do I estimate how well it does on the test data (unlabeled)?
Also, ...
3
votes
0answers
25 views
Scoring function for RidgeClassifierCV
I'm trying to implement a custom scoring function for RidgeClassifierCV in scikit-learn. This involves passing a custom scoring function as the score_func when initializing the RidgeClassifierCV ...
0
votes
1answer
26 views
ExtraTreeRegressor in scikit-learn (Python)
I have two questions on ExtraTreeRegressor in scikit-learn (Python).
1) Why is it not possible to increase the number of features above the dimension of the input space? The algorithm in [1] does not ...
3
votes
1answer
38 views
Scikit-learn Custom Scoring Function
I'm currently using the development branch of scikit-learn: 0.15-git.
Trying to initialize a RidgeClassifierCV object with a custom scoring function is currently failing with error message TypeError: ...
0
votes
0answers
26 views
Python sklearn SVC.fit() got error
My former version of sklearn is 0.13 and now I update it to 0.14.1. And my code below doesn't work now(It worked well before updating). Anyone know the reason? Here is my code and result.
print ...
1
vote
2answers
46 views
sklearn logistic regression - important features
I'm pretty sure it's been asked before, but I'm unable to find an answer
Running Logistic Regression using sklearn on python, I'm able to transform
my dataset to its most important features using the ...
1
vote
1answer
109 views
MemoryError in toarray when using DictVectorizer of Scikit Learn
I am trying to implement the SelectKBest algorithm on my data to get the best features out of it. For this I am first preprocessing my data using DictVectorizer and the data consists of 1061427 rows ...
1
vote
1answer
23 views
How can i vectorize list using sklearn DictVectorizer
I found next example on sklearn docs site:
>>> measurements = [
... {'city': 'Dubai', 'temperature': 33.},
... {'city': 'London', 'temperature': 12.},
... {'city': 'San ...
0
votes
1answer
36 views
Is it possible to reverse the transformation of KMeans in sklearn?
After clustering a dataset and then transforming the data to the distance from the centroids using sklearn.cluster.KMeans, is it possible to reverse the transformation, given the centroids, getting ...