All Questions
Tagged with pandas data-mining
9 questions
1
vote
1
answer
317
views
Better way to create a contingency table with pandas for film genres from a Film DataFrame
From a public dataset available on film rating I created a contingency table as follow.
Honestly I don't like all these "for-loops" I think the quality of the code can be definitely improved ...
2
votes
1
answer
505
views
Analyzing patient treatment data using Pandas
I work in the population health industry and get contracts from commercial companies to conduct research on their products. This is the general code to identify target patient groups from a provincial ...
3
votes
0
answers
524
views
Pandas data extraction task taking too much memory. How to optimize for memory usage?
I need to process some data (one of its columns contains a json/dict with params- I need to extract those params to individual columns of their own; catch- some rows have some parameters, others have ...
5
votes
0
answers
213
views
Code for training machine learning linear regression and SVM
Ok , for my final year project I've wrote this piece of code to train my machine learning model on a this dataset , here the code i used
...
6
votes
3
answers
10k
views
Gradient descent for linear regression using numpy/pandas
I currently follow along Andrew Ng's Machine Learning Course on Coursera and wanted to implement the gradient descent algorithm in python3 using ...
3
votes
1
answer
95
views
Calculating frequencies of each obs in the data
I am currently attempting to make some code more maintainable for a research project I am working on. I am definitely looking to create some more functions, and potentially create a general class to ...
7
votes
1
answer
475
views
PANDAS DataFrame operations to analyze top Server Fault tags [closed]
I am working on learning how to do frequency analysis of Server Fault question tags to see if there is any useful data that I can glean from them. I'm storing the raw data in Bitbucket for global ...
5
votes
1
answer
285
views
Data analytics on static file of 50,000+ tweets
I'm trying to optimize the main loop portion of this code, as well as learn any "best practices" insights I can for all of the code. This script currently reads in one large file full of tweets (50MB ...
6
votes
1
answer
26k
views
Apriori algorithm using Pandas
I want to optimize my Apriori algorithm for speed:
...