Recently Active 'machine-learning python' Questions

1

vote

1answer

7 views

How to handle categorical variables in sklearn GradientBoostingClassifier?

I am attempting to train models with GradientBoostingClassifier using categorical variables. The following is a primitive code sample, just for trying to input categorical variables into ...

modified 3 hours ago

larsmans
165k17237403

2

votes

2answers

44 views

Efficient way to cluster colors using K-Nearest

I am trying to cluster colors on an image to a predefined classes (black, white, blue, green, red). I'm using the following code: import numpy as np import cv2 src = cv2.imread('objects.png') ...

python opencv numpy machine-learning

modified yesterday

ivan_a
34739

-4

votes

0answers

24 views

Text-based Pattern-Recognition with python [on hold]

I am trying to parse a text that comes from OCR scanning through Abbyy finereader 8.0, so my input is XML files. What i need to do is find the text patterns on the page i have scanned. E.G. ...

python machine-learning data-mining pattern-recognition text-analysis

modified yesterday

themisomagos
1

0

votes

0answers

20 views

Bayesian Additive Regression Tree in Python [on hold]

Community I am creating a model to develop propensity scores for potential customers. I've used logistic regression so far. I read that Bayesian additive regression tree is a better fit as the data ...

python machine-learning bayesian

modified yesterday

ppwt
424

1

vote

0answers

26 views

Python's implementation of Mutual Information

I am having some issues implementing the Mutual Information Function that Python's machine learning libraries provide, in particular : sklearn.metrics.mutual_info_score(labels_true, labels_pred, ...

python machine-learning feature-selection

modified yesterday

Andreas
587

1

vote

1answer

59 views

How to gridsearch over transform arguments within a pipeline in scikit-learn

My goal is to use one model to select the most important variables and another model to use those variables to make predictions. In the example below I am using two RandomForestClassifiers, but the ...

python machine-learning scikit-learn pipeline

modified Jul 9 at 17:21

kipola
9613

-2

votes

0answers

19 views

OpenCV Expectation Maximization: training based on prevoius training

I am trying to use the expectation maximization functions from open cv in python recursively, this is: I train a first picture, and then I want to train sucesive pictures (Very alike) using the ...

python opencv machine-learning

modified Jul 9 at 11:38

user3020849
605

1

vote

0answers

38 views

sklearn setting learning rate of SGDClassifier vs LogsticRegression

As in sklearn, LogisticRegression(short for LR) has not direct method for solving weighted LR, so i pass to SGDClassifier(SGD). As with my experiment: i generate data follow LR distribution with ...

python algorithm machine-learning scikit-learn logistic-regression

modified Jul 9 at 10:54

user3817128
62

-1

votes

2answers

32 views

Python: questions about format in SVM coding

I want to use svm to do supervised machine learning. My project is: Given Obama's several speeches, and Romney's several speeches, the classifier can decide which speaker spoke this speech when we ...

python machine-learning scikit-learn svm

modified Jul 8 at 18:28

larsmans
165k17237403

0

votes

0answers

27 views

Sklearn SGDClassifier partial fit

I'm trying to use SGD to classify a large dataset. As the data is too large to fit into memory, I'd like to use the partial_fit method to train the classifier. I have selected a sample of the dataset ...

python machine-learning scikit-learn gradient-descent

modified Jul 8 at 11:55

larsmans
165k17237403

0

votes

0answers

36 views

Scikit-learn Categorical variables in regression

I have to make a regression on a dataFrame with categorical variables, what is the difference of using oneHot-encoding vs using pandas factorize method, i mean are there any difference in the ...

python pandas machine-learning scikit-learn

modified Jul 7 at 18:53

user3468556
1

0

votes

1answer

36 views

How to perform repeated experiments using Matlab from terminal?

I am working on a machine learning program and attempting to perform experiments on the variables of my neural network. Due to Matlab's prowess with matrices, the learning is being performed in Matlab ...

python matlab terminal machine-learning

modified Jul 7 at 18:42

Eli Duenisch
23717

0

votes

0answers

18 views

How to use sample weighting in RandomizedSearchCV?

I am working with scikit learn library in python and I want to weight to each sample during the cross validation using RandomizedSearchCV. When I try this code: search = RandomizedSearchCV(estimator, ...

python machine-learning scikit-learn cross-validation

modified Jul 7 at 18:18

boomz
400216

5

votes

3answers

233 views

Naive Bayes: Imbalanced Test Dataset

I am using scikit-learn Multinomial Naive Bayes classifier for binary text classification (classifier tells me whether the document belongs to the category X or not). I use a balanced dataset to train ...

python machine-learning classification scikit-learn text-classification

modified Jul 7 at 16:27

etov
1,293413

0

votes

0answers

16 views

Theano implementation of Stacked DenoisingAutoencoders - Why same input to dA layers?

In the tutorial Stacked DenoisingAutoencoders on http://deeplearning.net/tutorial/SdA.html#sda, the pretraining_functions return a list of functions which represent the train function of each dA ...

python machine-learning neural-network theano autoencoder

modified Jul 7 at 14:18

rayryeng
4,9461925

1

vote

1answer

109 views

Python Non negative Matrix Factorization that handles both zeros and missing data?

I look for a NMF implementation that has a python interface, and handles both missing data and zeros. I don't want to impute my missing values before starting the factorization, I want them to be ...

python machine-learning scikit-learn collaborative-filtering matrix-factorization

modified Jul 7 at 9:59

Tautvydas
444513

0

votes

1answer

63 views

Does KNeighborsClassifier compare lists with different sizes?

I have to use Scikit Lean's KNeighborsClassifier to compare time series using an user defined function in Python. knn = ...

python machine-learning time-series scikit-learn knn

modified Jul 6 at 20:21

user41047
1762

1

vote

3answers

260 views

How to calculate bits per character of a string? (bpc)

A paper I was reading, http://www.cs.toronto.edu/~ilya/pubs/2011/LANG-RNN.pdf, uses bits per character as a test metric for estimating the quality of generative computer models of text but doesn't ...

python algorithm machine-learning nlp entropy

modified Jul 6 at 20:04

Ozgur Yilmaz
1

0

votes

0answers

34 views

Nested cross-validation in grid search for precomputed kernels in scikit-learn

I have a precomputed kernel of size NxN. I am using GridSearchCV to tune C parameter of SVM with kernel='precomputed' as follows: C_range = 10. ** np.arange(-2, 9) param_grid = dict(C=C_range) grid = ...

python machine-learning scikit-learn

modified Jul 6 at 12:36

user3733188
42

0

votes

0answers

23 views

Plot individual decision boundary for a neuron in feedforward ANN

I have a feedforward neural network with a single hidden layer which I generate using pybrain (I do not insist on using it, any tool will do as long as it solves my problem). It consists of a linear ...

python matplotlib machine-learning neural-network pybrain

modified Jul 6 at 11:04

Dainis Boumber
16

0

votes

0answers

41 views

TypeError: fit() takes exactly 3 arguments (2 given) with sklearn and sklearn_pandas

I'm trying to use the sklearn_pandas module to extend the work I do in pandas and dip a toe into machine learning but I'm struggling with an error I don't really understand how to fix. I was working ...

python pandas machine-learning scikit-learn

modified Jul 5 at 18:59

Korem
878319

0

votes

0answers

15 views

ZeroDivisionError using deepnet in python

Is there any tutorial or guideline on how to systematically use deepnet library for other datasets? I developed a code myself that generates the training and testing npy datasets and the pbtxt file. ...

python machine-learning deep-learning

modified Jul 4 at 19:44

iBM
257

2

votes

2answers

44 views

How to make the basic inverted index program more pythonic

I have code for an invertedIndex as follows. However I'm not too satisfied with it and was wondering how it can be made more compact and pythonic class invertedIndex(object): def ...

python machine-learning nlp inverted-index

modified Jul 4 at 1:02

shavenwarthog
2,2561711

0

votes

1answer

50 views

Classifying new occurances - Multinomial Naive Bayes

So I have currently trained a Multinomial Naive Bayes classifier, using [SKiLearn][1] Now what I can do is classify test data by using predict. But if I want to run this every night, as a script, I ...

python machine-learning classification scikit-learn

modified Jul 3 at 18:38

larsmans
165k17237403

0

votes

2answers

57 views

pylearn2's show_weights.py: 'str' object has no attribute 'get_weights_view'

edit: The bug was resolved in PR 1012. I'm having trouble running show_weights.py cifar_grbm_smd.pkl in step 3 of the quick start tutorial, which returns: ... in weights_view = ...

python machine-learning

modified Jul 3 at 17:54

Emre
4721415

0

votes

1answer

20 views

Ensemble learning with 2 classifiers

I'm trying to combine 2 approaches to classifying my data, one comes from a SVM and another external classifier that gives out one or more labels as to what it thinks the observation point is. Is it ...

python machine-learning scikit-learn

modified Jul 3 at 9:13

Kyle Kastner
2164

0

votes

0answers

25 views

Give weights to features [closed]

I want to perform text classification on tweets, I want to give more importance to the hash tags than the other words, how exactly do I do that? I'm using scikit and it has an option to show the ...

python machine-learning scikit-learn

modified Jul 3 at 4:09

user3666471
213

1

vote

2answers

2k views

Documentation for libsvm in python

Is there any good documentation for libsvm in python with a few non-trivial examples, that explain what each of the flags mean, and how data can the trained and tested from end to end? (There is no ...

python machine-learning svm libsvm

modified Jul 2 at 10:47

jabaldonedo
5,86121234

3

votes

1answer

476 views

Python : How to find Accuracy Result in SVM Text Classifier Algorithm for Multilabel Class

I have used following set of code: And I need to check accuracy of X_train and X_test The following code works for me in my classification problem over multi-labeled class import numpy as np from ...

python machine-learning svm scikit-learn svc

modified Jul 2 at 7:12

Community♦
1

0

votes

1answer

34 views

Online version of scikit-learn's TfidfVectorizer

I'm looking to use scikit-learn's HashingVectorizer because it's a great fit for online learning problems (new tokens in text are guaranteed to map to a "bucket"). Unfortunately the implementation ...

python machine-learning nlp scikit-learn vectorization

modified Jul 1 at 22:03

Raff.Edward
1,59029

0

votes

0answers

36 views

Time series forecasting with support vector regression

I'm trying to perform a simple time series prediction using support vector regression. I am trying to understand the answer provided here. I adapted Tom's code to reflect the answer provided: ...

python machine-learning time-series scikit-learn regression

modified Jul 1 at 19:33

Pythontology
12

1

vote

1answer

62 views

Defining a gradient with respect to a subtensor in Theano

I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite ...

python machine-learning theano

modified Jul 1 at 10:45

user3054726
83

69

votes

12answers

12k views

How can I build a model to distinguish tweets about Apple (Inc.) from tweets about apple (fruit)?

See below for 50 tweets about "apple." I have hand labeled the positive matches about Apple Inc. They are marked as 1 below. Here are a couple of lines: 1|“@chrisgilmer: Apple targets big business ...

java python r machine-learning classification

modified Jul 1 at 7:09

Gunjan
1908

0

votes

0answers

17 views

Run Mclust in Python via rpy2 package

I was trying to run the mclust package in Python via rpy2. I ran into the problem of not being able to access the results in Python. In R, to apply Mclust, I would do the following (a simple example): ...

python r machine-learning statistics cluster-analysis

modified Jun 30 at 14:31

user2498497
538

2

votes

0answers

35 views

scikit-learn svm module and predict function not working

I am trying to get an SVM to work using scikit-learn but cannot get the results I am expecting. I would like to use k-means to classify roughly 2-5 data clusters and then use an SVM to build a model ...

python machine-learning scikit-learn svm k-means

modified Jun 29 at 17:54

Pholotic
113

0

votes

0answers

45 views

Is there any way to identify a person's title through NLTK?

I'd like to be able to extract the title or job position of a person from a short description. For example: Assistant professor in University of California. Owner of car shop in San Francisco,CA. ...

python algorithm machine-learning nltk

modified Jun 28 at 18:12

jonrsharpe
27.4k41231

0

votes

1answer

32 views

Gradient descent not working as expected

I am using Stochastic Gradient Descent from scikit-learn http://scikit-learn.org/stable/modules/sgd.html. The example given in the link works like this: >>> from sklearn.linear_model import ...

python machine-learning scipy linear-regression gradient-descent

modified Jun 28 at 9:40

lejlot
11.8k2830

1

vote

1answer

32 views

pandas: groupby and unstack to create feature vector for classification

I have a pandas dataframe displaying users' performance on test questions. It looks like this: userID questionID correct ------------------------------- 1 1 1 1 ...

python pandas machine-learning

modified Jun 27 at 6:29

Andy Hayden
37.3k104994

0

votes

0answers

32 views

Optimising accuracy for OneClassSVM

I have a problem which requires the use of a one class classification system. I am currently using python for development and I am using sci-kit learn for machine learning tasks as a result. From ...

python machine-learning scikit-learn

modified Jun 25 at 21:10

Michael Aquilina
822412

1

vote

1answer

35 views

Multi variable gradient descent

I am learning gradient descent for calculating coefficients. Below is what I am doing: #!/usr/bin/Python import numpy as np # m denotes the number of examples here, not the number of features ...

python machine-learning linear-regression gradient-descent

modified Jun 25 at 20:10

DrV
3,1931214

0

votes

2answers

29 views

Mclust (R) equivalent package in Python

Is there an Mclust equivalent command or mclust equivalent package in Python? I searched the documentation for sklearn. It has GMM for classification, not for clustering. I have installed rpy2, but I ...

python r python-2.7 machine-learning

modified Jun 25 at 3:00

lgautier
4,037519

1

vote

1answer

33 views

Understanding format of data in scikit-learn

I am trying to work with multi-label text classification using scikit-learn in Python 3.x. I have data in libsvm format which I am loading using load_svmlight_file module. The data format is like ...

python numpy machine-learning scipy scikit-learn

modified Jun 24 at 12:28

larsmans
165k17237403

0

votes

1answer

63 views

What is “The sum of true positives and false positives are equal to zero for some labels.” mean?

I'm using scikit learn to perform cross validation using StratifiedKFold to compute the f1 score, but it says that some of my labels have the sum of true positives and false positives are equal to ...

python machine-learning scikit-learn

modified Jun 24 at 7:54

mbatchkarov
4,64621235

10

votes

3answers

6k views

Using frequent itemset mining to build association rules?

I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this: Dataset 1: A B C 0 E A 0 C 0 0 A 0 C D E A 0 C 0 E The way I ...

python machine-learning data-mining

modified Jun 23 at 12:48

Phil
1,104925

10

votes

6answers

3k views

Is there a good and easy way to visualize high dimensional data?

Can someone please tell me if there is a good (easy) way to visualize high dimensional data? My data is currently 21 dimensions but I would like to see how whether it is dense or sparse. Are there ...

python language-agnostic graph machine-learning

modified Jun 23 at 3:51

user1707933
1

0

votes

0answers

38 views

Multilabel grid search in ScikitLearn

I am new to scikit-learn and I want to do find the best parameters for multi-label classification problem with scikit-learn GridSearch. I cannot get it working and I am pretty sure there is something ...

python machine-learning scikit-learn

modified Jun 20 at 4:59

Egor Lakomkin
127211

2

votes

1answer

54 views

How do you estimate the performance of a classifier on test data?

I'm using scikit to make a supervised classifier and I am currently tuning it to give me good accuracy on the labeled data. But how do I estimate how well it does on the test data (unlabeled)? Also, ...

python machine-learning scikit-learn

modified Jun 19 at 22:33

Barmaley.exe
1,683613

0

votes

1answer

38 views

Is it possible to reverse the transformation of KMeans in sklearn?

After clustering a dataset and then transforming the data to the distance from the centroids using sklearn.cluster.KMeans, is it possible to reverse the transformation, given the centroids, getting ...

python machine-learning scikit-learn k-means dimensionality-reduction

modified Jun 18 at 23:08

BartoszKP
13.5k61742

3

votes

2answers

36 views

Performance issue in computing multiple linear regression with huge data sets

I am using np.linalg.lstsq for calculating the multiple linear regression. My data set is huge: has 20,000 independent variables(X) and 1 dependent variable (Y). Each independent variable has 10,000 ...

python numpy machine-learning linear-regression

modified Jun 18 at 15:29

user3684792
337

0

votes

1answer

52 views

Saving a feature vector for new data in scikit-learn

To create a machine learning algorithm I made a list of dictionaries and used scikit's DictVectorizer to make a feature vector for each item. I then created an SVM model from a dataset using part of ...

python machine-learning scikit-learn

modified Jun 18 at 3:10

Shakesbeery
5017

your communities

Tagged Questions

Related Tags