Pandas is a Python data analysis library.
1
vote
2answers
54 views
3
votes
1answer
44 views
Applying different equations to a Pandas DataFrame
I wrote a task using pandas and I'm wondering if the code can be optimized. The code pretty much goes as:
All the dataframes are 919 * 919. socio is a dataframe ...
5
votes
2answers
128 views
Split latitude/longitude by degree to make file names and folder directory names
This code works and does exactly as I want, but I am wondering if/how it could be done faster/more efficiently? This runs quickly on a demo file, but bogs down when I introduce much larger files (3-...
1
vote
2answers
346 views
Finding the states with the three most populous counties
I just started to use Python and Pandas. My current solution to a problem looks ugly and inefficient. I would like to know how to improve it. Data file is Census 2010 can be viewed here
Question:
...
2
votes
1answer
39 views
Simplify DataFrame operations in Python
I am working on learning how to do frequency analysis of Server Fault question tags to see if there is any useful data that I can glean from them. I'm storing the raw data in Bitbucket for global ...
3
votes
0answers
41 views
4
votes
2answers
55 views
Select the n most frequent items from a pandas groupby dataframe
I´m working on trying to get the n most frequent items from a pandas dataframe similar to
...
1
vote
1answer
32 views
Calls many stored dataframes, change them and combine them
I call many dataframes (using pickle), then I extract the interesting values of each one and finally combine them. Is there a better way to do it? For example with a for loop call them automatically ...
3
votes
1answer
68 views
Compressing time series by removing repeated samples
I work on a project with time series data. So there are samples (\$y\$), and each sample has a timestamp (\$x\$). The data will be visualized, but often there are time series which contain samples ...
4
votes
3answers
93 views
Plotting from a Pandas dataframe
I want to improve my code. Is it possible to get the plot without repeating the same instructions multiple lines?
The data comes from a Pandas' dataframe, but I am only plotting the last column (...
0
votes
0answers
37 views
Cross-join time-series using asof criteria while limiting number of rows
I have a unique requirement around time-series analysis and have presented my requirements along with a simple working solution. I have also worked out sample example to help understand my ...
0
votes
0answers
15 views
Sending the accuracies of 4 cross-validations to a pandas dataframe
I want to improve my code.
It performs 4 cross-validations: 10x20, 10x10, 10x5 10x2. Then it stores the values in Pandas where I will get the average of each column and multiply them by 100.
You ...
3
votes
0answers
55 views
Creating multiple observations from single observation depending on conditions
I have written a function in python with three loops which is time consuming. Is it possible to do the same operation in less time with some other way.
Here are my code and sample data you can run at ...
5
votes
3answers
126 views
Matching values from html table for updating values in pandas dataframe
This is more of an exercise for me to get use to Pandas and its dataframes. For those who didn't hear of it:
Pandas is a Python package providing fast, flexible, and expressive
data structures ...
3
votes
0answers
184 views
Pandas calculation speed of stock beta on many dataframes
I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis. I am new to Python and want to calculate a rolling 12month ...
1
vote
1answer
218 views
Using pandas and sklearn for forecasting stock market return
I have been using R for stock analysis and machine learning purpose but read somewhere that python is lot faster than R, so I am trying to learn Python for that.
I am using Yhat's rodeo IDE (Python ...
6
votes
1answer
151 views
Similarity research : K-Nearest Neighbour(KNN) using a linear regression to determine the weights
I have a set of houses with categorical and numerical data. Later I will have a new house and my goal will be to find the 20 closest houses.
The code is working fine, and the result are not so bad but ...
2
votes
0answers
52 views
Using pandas.io.pytables.TableIterator for several iterations
I'm writing method that takes a pandas iterator and then computes something based on that. The problem is that some of these computations require iterating twice over the data.
For example the ...
4
votes
1answer
79 views
Extracting time duration in the session from 30 million rows
I am looking for making my code faster. I am working on yoochoose recsys 2015 dataset.. and trying to perform some transformations.. [recsys2015], it has got 30 million plus rows of data. The goal of ...
3
votes
0answers
59 views
KNN pipeline w/ cross_validation_scores
Using the wine quality dataset, I'm attempting to perform a simple KNN classification (w/ a scaler, and the classifier in a pipeline). It works, but I've never used ...
3
votes
1answer
102 views
Calculate working minutes between two timestamps
I've created a function to calculate the working minutes between two timestamps.
I class working minutes as those between 9am - 5pm and not on weekends or national holidays.
The holidays are those ...
3
votes
1answer
34 views
Filling in gym membership prices by joining PANDAS dataframes
I have two dataframes, df which contains a list of members in addition to the type of contract that they purchased on a given date, ...
3
votes
2answers
72 views
Optimize a simple and quick python script for transposing a .csv file
I need to transpose the following file output1.csv, which is is a result from a quantum chemistry calculation into a single colum efficiently:
...
1
vote
0answers
32 views
Grouping logs and providing counts
The code is supposed to group start and end time logs and provide log counts and unique ID counts. The grouping will be variable 1 hour, 6 hours, 12 hours, 24 hours, etc.
What is the better way to ...
1
vote
0answers
27 views
ChiSquare test Code in Python
I was trying to write a code from scratch for Chi-square test.This is the code that I had written in python using pandas.I had a doubt whether the code can produce the desired output or can be written ...
3
votes
0answers
376 views
Rolling OLS algorithm in a dataframe
I want to be able to find a solution to run the following code in a much faster fashion (ideally something like dataframe.apply(func) which has the fastest speed, ...
1
vote
2answers
74 views
Permute and count between nested dictionaries
Goal:
Permute inner values between two dictionaries with matching outer key,
Save permuatations in counter,
Move to next outer key, update counter with new permutations.
Problem:
Many lookups on ...
3
votes
0answers
55 views
Analysis of TV station preferences for various demographic groups
I have a notebook here which details my code and may be more legible.
I worked on a project that flattens a table that initially looked like this (this is table1):
...
1
vote
1answer
752 views
Python Pandas Apply with a Lambda Function
I have a table in pandas that has two columns, QuarterHourDimID and StartDateDimID ; these columns give me an ID for each date / ...
2
votes
1answer
65 views
Counting matches won by teams
This script counts how many matches were won by the team with a first half, second half, and full game time of possession differential advantage.
The data is stored in a pandas data frame:
...
0
votes
0answers
71 views
Memory efficient alternative passing mostly static variables w/ 1 dynamic variable to an external function (COM) instead of a for loop in Python
Massive data problem, 7/8 variables are repeated 1000 times in a for loop, sending 8 input vectors * 50,000 values per loop for each function call. Only 1 input ...
1
vote
1answer
64 views
SKlearn automate data pre treatment
I want to make a simple wrapper for sklearn models. The idea is that the wrapper automatically takes care of factors (columns of type "object") replacing them with ...
4
votes
1answer
74 views
3
votes
0answers
161 views
Python Script to query OSRM (drive-times)
The below script takes two arguments
Path to the OSRM Map
Path to a .csv containing the columns ...
3
votes
1answer
71 views
Backing up files to remote servers
I have code that backup files to remote servers.
There are 2 pandas data frames:
storedfiles: that points to the files already stored in previous runs
...
3
votes
1answer
111 views
Analysis of call center employee performance
This code basically calculates a few metrics based on numerators and denominators and then it bins the data into quintiles. This is a common operation that I perform for various data sets. My issue is ...
3
votes
1answer
260 views
PANDAS spatial clustering
I'am writing on a spatial clustering algorithm using pandas and scipy's kdtree. I profiled the code and the .loc part takes most time for bigger datasets. I wonder ...
13
votes
3answers
636 views
Analyze frequency and content of political fundraising E-mails
Since I'm a big politics nerd, I wanted to write a little script that would analyze the frequency and content of political fundraising emails. I signed up for the e-mails of 6 campaigns, donated a ...
4
votes
1answer
102 views
Data analytics on static file of 50,000+ tweets
I'm trying to optimize the main loop portion of this code, as well as learn any "best practices" insights I can for all of the code. This script currently reads in one large file full of tweets (50MB ...
4
votes
1answer
54 views
Appending unique data to a postgres table
I receive daily files (filename_%m_%d_%Y.csv) from a client and I read those in pandas, process them, and store them in Postgres. Sometimes there are delays and we do not get the data for a few days. ...
4
votes
3answers
99 views
Load recurring (but not strictly identical) sets of Key, Values into a DataFrame from text files
I am reading text files that contain data from observations. The format is not Fixed Width or Delimited, so I built a generator ...
4
votes
1answer
148 views
Using Pandas to parse adwords export
I did this exercise yesterday mostly as practice, but it has some utility as well in day to day. I was basically attempting to take a string that looked like the following:
...
5
votes
1answer
71 views
Slicing time spans into calendar months
I have apparently correct code that still runs for weeks on my data (tens of millions of rows). I show the entire code for reference (and maybe other gains to be made), but the key operation is in the ...
1
vote
1answer
42 views
Multi-linear regression with trivial model refinement algorithms
I wrote this code for an assignment. It was originally meant to be written in Maple, but got very frustrated with some of Maple's idiosyncrasies that I decided to play around with Pandas instead. This ...
2
votes
2answers
132 views
Generating a PANDAS DataFrame of simulated coin tosses
It's taking my machine quite a long time to execute 1 billion (1st loop x 10, 2nd loop x 1000, 3rd loop x 100,000) instructions. Suggestions for performance enhancements? Sources of potential concern: ...
4
votes
1answer
102 views
Random Forest Code Optimization
I am new to Python. I have built a model with randomforest in python. But I think my code is not optimized. Please look into my code and suggest if I have deviated from best practices.
Overview about ...
7
votes
2answers
293 views
Project Euler #19: Counting Sundays in the 20th century using Pandas
Project Euler #19 asks:
How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?
I'm hoping I wasn't too off course from the spirit of the exercise ...
1
vote
1answer
204 views
Operating multiple columns of one pandas DataFrame using data from another
I have a DataFrame of data from a survey that was repeated over several years, asking people about their income and how much money they had in savings. For simplicity, let's pretend it looks like this:...
2
votes
1answer
164 views
Creating a time-course dependent, correlation-based directed graph with Networkx
I have a correlation matrix containing 4 time points, each with multiple samples. Each sample is identified with a time point with its name. What I am trying to accomplish here is to create a directed ...
4
votes
1answer
42 views
Reading groups of files and concatenating them
I have made some adjustments to some code that you can see on this thread:
Read daily files and concatenate them
I would like to make some further refinements and make sure I am on the right track ...