Tagged Questions
1
vote
1answer
7 views
How to handle categorical variables in sklearn GradientBoostingClassifier?
I am attempting to train models with GradientBoostingClassifier using categorical variables.
The following is a primitive code sample, just for trying to input categorical variables into ...
2
votes
2answers
44 views
Efficient way to cluster colors using K-Nearest
I am trying to cluster colors on an image to a predefined classes (black, white, blue, green, red). I'm using the following code:
import numpy as np
import cv2
src = cv2.imread('objects.png')
...
-4
votes
0answers
24 views
Text-based Pattern-Recognition with python [on hold]
I am trying to parse a text that comes from OCR scanning through Abbyy finereader 8.0, so my input is XML files.
What i need to do is find the text patterns on the page i have scanned.
E.G.
...
0
votes
0answers
20 views
Bayesian Additive Regression Tree in Python [on hold]
Community
I am creating a model to develop propensity scores for potential customers. I've used logistic regression so far. I read that Bayesian additive regression tree is a better fit as the data ...
1
vote
0answers
26 views
Python's implementation of Mutual Information
I am having some issues implementing the Mutual Information Function that Python's machine learning libraries provide, in particular :
sklearn.metrics.mutual_info_score(labels_true, labels_pred, ...
1
vote
1answer
59 views
How to gridsearch over transform arguments within a pipeline in scikit-learn
My goal is to use one model to select the most important variables and another model to use those variables to make predictions. In the example below I am using two RandomForestClassifiers, but the ...
-2
votes
0answers
19 views
OpenCV Expectation Maximization: training based on prevoius training
I am trying to use the expectation maximization functions from open cv in python recursively, this is: I train a first picture, and then I want to train sucesive pictures (Very alike) using the ...
1
vote
0answers
38 views
sklearn setting learning rate of SGDClassifier vs LogsticRegression
As in sklearn, LogisticRegression(short for LR) has not direct method for solving weighted LR, so i pass to SGDClassifier(SGD).
As with my experiment: i generate data follow LR distribution with ...
-1
votes
2answers
32 views
Python: questions about format in SVM coding
I want to use svm to do supervised machine learning. My project is: Given Obama's several speeches, and Romney's several speeches, the classifier can decide which speaker spoke this speech when we ...
0
votes
0answers
27 views
Sklearn SGDClassifier partial fit
I'm trying to use SGD to classify a large dataset. As the data is too large to fit into memory, I'd like to use the partial_fit method to train the classifier. I have selected a sample of the dataset ...
0
votes
0answers
36 views
Scikit-learn Categorical variables in regression
I have to make a regression on a dataFrame with categorical variables, what is the difference of using oneHot-encoding vs using pandas factorize method, i mean are there any difference in the ...
0
votes
1answer
36 views
How to perform repeated experiments using Matlab from terminal?
I am working on a machine learning program and attempting to perform experiments on the variables of my neural network. Due to Matlab's prowess with matrices, the learning is being performed in Matlab ...
0
votes
0answers
18 views
How to use sample weighting in RandomizedSearchCV?
I am working with scikit learn library in python and I want to weight to each sample during the cross validation using RandomizedSearchCV. When I try this code:
search = RandomizedSearchCV(estimator, ...
5
votes
3answers
233 views
Naive Bayes: Imbalanced Test Dataset
I am using scikit-learn Multinomial Naive Bayes classifier for binary text classification (classifier tells me whether the document belongs to the category X or not). I use a balanced dataset to train ...
0
votes
0answers
16 views
Theano implementation of Stacked DenoisingAutoencoders - Why same input to dA layers?
In the tutorial Stacked DenoisingAutoencoders on http://deeplearning.net/tutorial/SdA.html#sda, the pretraining_functions return a list of functions which represent the train function of each dA ...
1
vote
1answer
109 views
Python Non negative Matrix Factorization that handles both zeros and missing data?
I look for a NMF implementation that has a python interface, and handles both missing data and zeros.
I don't want to impute my missing values before starting the factorization, I want them to be ...
0
votes
1answer
63 views
Does KNeighborsClassifier compare lists with different sizes?
I have to use Scikit Lean's KNeighborsClassifier to compare time series using an user defined function in Python.
knn = ...
1
vote
3answers
260 views
How to calculate bits per character of a string? (bpc)
A paper I was reading, http://www.cs.toronto.edu/~ilya/pubs/2011/LANG-RNN.pdf, uses bits per character as a test metric for estimating the quality of generative computer models of text but doesn't ...
0
votes
0answers
34 views
Nested cross-validation in grid search for precomputed kernels in scikit-learn
I have a precomputed kernel of size NxN. I am using GridSearchCV to tune C parameter of SVM with kernel='precomputed' as follows:
C_range = 10. ** np.arange(-2, 9)
param_grid = dict(C=C_range)
grid = ...
0
votes
0answers
23 views
Plot individual decision boundary for a neuron in feedforward ANN
I have a feedforward neural network with a single hidden layer which I generate using pybrain (I do not insist on using it, any tool will do as long as it solves my problem). It consists of a linear ...
0
votes
0answers
41 views
TypeError: fit() takes exactly 3 arguments (2 given) with sklearn and sklearn_pandas
I'm trying to use the sklearn_pandas module to extend the work I do in pandas and dip a toe into machine learning but I'm struggling with an error I don't really understand how to fix.
I was working ...
0
votes
0answers
15 views
ZeroDivisionError using deepnet in python
Is there any tutorial or guideline on how to systematically use deepnet library for other datasets?
I developed a code myself that generates the training and testing npy datasets and the pbtxt file. ...
2
votes
2answers
44 views
How to make the basic inverted index program more pythonic
I have code for an invertedIndex as follows. However I'm not too satisfied with it and was wondering how it can be made more compact and pythonic
class invertedIndex(object):
def ...
0
votes
1answer
50 views
Classifying new occurances - Multinomial Naive Bayes
So I have currently trained a Multinomial Naive Bayes classifier, using [SKiLearn][1]
Now what I can do is classify test data by using predict.
But if I want to run this every night, as a script, I ...
0
votes
2answers
57 views
pylearn2's show_weights.py: 'str' object has no attribute 'get_weights_view'
edit: The bug was resolved in PR 1012.
I'm having trouble running show_weights.py cifar_grbm_smd.pkl in step 3 of the quick start tutorial, which returns:
... in weights_view = ...
0
votes
1answer
20 views
Ensemble learning with 2 classifiers
I'm trying to combine 2 approaches to classifying my data, one comes from a SVM and another external classifier that gives out one or more labels as to what it thinks the observation point is. Is it ...
0
votes
0answers
25 views
Give weights to features [closed]
I want to perform text classification on tweets, I want to give more importance to the hash tags than the other words, how exactly do I do that? I'm using scikit and it has an option to show the ...
1
vote
2answers
2k views
Documentation for libsvm in python
Is there any good documentation for libsvm in python with a few non-trivial examples, that explain what each of the flags mean, and how data can the trained and tested from end to end?
(There is no ...
3
votes
1answer
476 views
Python : How to find Accuracy Result in SVM Text Classifier Algorithm for Multilabel Class
I have used following set of code:
And I need to check accuracy of X_train and X_test
The following code works for me in my classification problem over multi-labeled class
import numpy as np
from ...
0
votes
1answer
34 views
Online version of scikit-learn's TfidfVectorizer
I'm looking to use scikit-learn's HashingVectorizer because it's a great fit for online learning problems (new tokens in text are guaranteed to map to a "bucket"). Unfortunately the implementation ...
0
votes
0answers
36 views
Time series forecasting with support vector regression
I'm trying to perform a simple time series prediction using support vector regression.
I am trying to understand the answer provided here.
I adapted Tom's code to reflect the answer provided:
...
1
vote
1answer
62 views
Defining a gradient with respect to a subtensor in Theano
I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite ...
69
votes
12answers
12k views
How can I build a model to distinguish tweets about Apple (Inc.) from tweets about apple (fruit)?
See below for 50 tweets about "apple." I have hand labeled the positive matches about Apple Inc. They are marked as 1 below.
Here are a couple of lines:
1|“@chrisgilmer: Apple targets big business ...
0
votes
0answers
17 views
Run Mclust in Python via rpy2 package
I was trying to run the mclust package in Python via rpy2. I ran into the problem of not being able to access the results in Python. In R, to apply Mclust, I would do the following (a simple example):
...
2
votes
0answers
35 views
scikit-learn svm module and predict function not working
I am trying to get an SVM to work using scikit-learn but cannot get the results I am expecting. I would like to use k-means to classify roughly 2-5 data clusters and then use an SVM to build a model ...
0
votes
0answers
45 views
Is there any way to identify a person's title through NLTK?
I'd like to be able to extract the title or job position of a person from a short description.
For example:
Assistant professor in University of California.
Owner of car shop in San Francisco,CA.
...
0
votes
1answer
32 views
Gradient descent not working as expected
I am using Stochastic Gradient Descent from scikit-learn http://scikit-learn.org/stable/modules/sgd.html. The example given in the link works like this:
>>> from sklearn.linear_model import ...
1
vote
1answer
32 views
pandas: groupby and unstack to create feature vector for classification
I have a pandas dataframe displaying users' performance on test questions. It looks like this:
userID questionID correct
-------------------------------
1 1 1
1 ...
0
votes
0answers
32 views
Optimising accuracy for OneClassSVM
I have a problem which requires the use of a one class classification system. I am currently using python for development and I am using sci-kit learn for machine learning tasks as a result.
From ...
1
vote
1answer
35 views
Multi variable gradient descent
I am learning gradient descent for calculating coefficients. Below is what I am doing:
#!/usr/bin/Python
import numpy as np
# m denotes the number of examples here, not the number of features
...
0
votes
2answers
29 views
Mclust (R) equivalent package in Python
Is there an Mclust equivalent command or mclust equivalent package in Python? I searched the documentation for sklearn. It has GMM for classification, not for clustering.
I have installed rpy2, but I ...
1
vote
1answer
33 views
Understanding format of data in scikit-learn
I am trying to work with multi-label text classification using scikit-learn in Python 3.x. I have data in libsvm format which I am loading using load_svmlight_file module. The data format is like ...
0
votes
1answer
63 views
What is “The sum of true positives and false positives are equal to zero for some labels.” mean?
I'm using scikit learn to perform cross validation using StratifiedKFold to compute the f1 score, but it says that some of my labels have the sum of true positives and false positives are equal to ...
10
votes
3answers
6k views
Using frequent itemset mining to build association rules?
I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this:
Dataset 1:
A B C 0 E
A 0 C 0 0
A 0 C D E
A 0 C 0 E
The way I ...
10
votes
6answers
3k views
Is there a good and easy way to visualize high dimensional data?
Can someone please tell me if there is a good (easy) way to visualize high dimensional data? My data is currently 21 dimensions but I would like to see how whether it is dense or sparse. Are there ...
0
votes
0answers
38 views
Multilabel grid search in ScikitLearn
I am new to scikit-learn and I want to do find the best parameters for multi-label classification problem with scikit-learn GridSearch. I cannot get it working and I am pretty sure there is something ...
2
votes
1answer
54 views
How do you estimate the performance of a classifier on test data?
I'm using scikit to make a supervised classifier and I am currently tuning it to give me good accuracy on the labeled data. But how do I estimate how well it does on the test data (unlabeled)?
Also, ...
0
votes
1answer
38 views
Is it possible to reverse the transformation of KMeans in sklearn?
After clustering a dataset and then transforming the data to the distance from the centroids using sklearn.cluster.KMeans, is it possible to reverse the transformation, given the centroids, getting ...
3
votes
2answers
36 views
Performance issue in computing multiple linear regression with huge data sets
I am using np.linalg.lstsq for calculating the multiple linear regression. My data set is huge: has 20,000 independent variables(X) and 1 dependent variable (Y). Each independent variable has 10,000 ...
0
votes
1answer
52 views
Saving a feature vector for new data in scikit-learn
To create a machine learning algorithm I made a list of dictionaries and used scikit's DictVectorizer to make a feature vector for each item. I then created an SVM model from a dataset using part of ...