Pandas is a Python data analysis library.

learn more… | top users | synonyms

6
votes
3answers
101 views

Chi Square Independence Test for Two Pandas DF columns

I want to calculate the scipy.stats.chi2_contingency() for two columns of a pandas DataFrame. The data is categorical, like this: ...
8
votes
1answer
120 views

A big “Game of Life”

Our quest: Create a big simulation for Conway's Game of Life, and record the entire simulation history. Current Approach: Cython is used for an iterate method. The ...
1
vote
1answer
207 views

Split excel file with multiple sheets, manipulate the data and create final out file

I have an excel file with 20+ separate sheets containing tables of data. My script iterates through each sheet, manipulates the data into the format I want it and then saves it to a final output file. ...
5
votes
1answer
115 views

A custom Pandas dataframe to_string method

Oftentimes I find myself converting pandas.DataFrame objects to lists of formatted row strings, so I can print the rows into, e.g. a ...
4
votes
1answer
35 views

Identifying surface events happening at specific time intervals

Here is some code I wrote to surface Mint.com transactions that occur at monthly intervals, in order to identify subscriptions I may be paying for without realizing it. I'd like to have some friends ...
3
votes
2answers
132 views

Long expression to sum data in a multi-dimensional model

I am porting a linear optimization model for power plants from GAMS to Pyomo. Models in both frameworks are a collection of sets (both elementary or tuple sets), parameters (fixed values, defined over ...
3
votes
1answer
17 views

Reading and processing a file using Pandas

I am trying to read a file using pandas and then process it. For opening the file I use the following function: ...
6
votes
0answers
86 views

Tkinter GUI for making very simple edits to pandas DataFrames

It is part of a separate application that allows users to interact very loosely with different databases and check for possible errors and make corrections. ...
4
votes
0answers
37 views

Outputting scatter plots

I have written a python function that outputs scatter plots using Matplotlib after processing the data a little. It works but it's painfully slow. I was wondering if anybody had any suggestions as to ...
4
votes
3answers
155 views

Rating tennis players in a database, taking days to run

I have this project in data analysis for creating a ranking of tennis players. Currently, it takes more than 6 days to run on my computer. Can you review the code and see where's the problem? ...
3
votes
1answer
105 views

Imputing values with non-negative matrix factorization

X is a DataFrame w/ about 90% missing values and around 10% actual values. My goal is to use nmf in a successive imputation loop to predict the actual values I have ...
7
votes
1answer
48 views

Querying houses similar to a given house

I was given this task as an interview coding challenge and was wondering If the code is well structured and follows python guidelines. I chose to sort the houses based on a similarity metric and then ...
0
votes
1answer
166 views

Simple k-means implemention using Python3 and Pandas

Is there anything I can improve? The distance function is Pearson correlation. ...
1
vote
0answers
160 views

Speed up projection of a bipartitie network for a big file using NetworkX and Pandas

I have a pretty big file (3 million lines) with each line being a person-to-event relationship. Ultimate, I want to project this bipartite network onto a single-mode, weighted, network, and write it ...
2
votes
1answer
64 views

Extracting contents of dictionary contained in Pandas dataframe to make new dataframe columns

I created a Pandas dataframe from a MongoDB query. c = db.runs.find().limit(limit) df = pd.DataFrame(list(c)) Right now one column of the dataframe corresponds ...
3
votes
1answer
53 views

Data cleansing and formatting script

This is a script that creates a base dataframe from a sqlite database, adds data to it (also from SQLite), cleanse it and formats it all to an Excel file. I feel like it is incredibly verbose and my ...
3
votes
1answer
57 views

Fetching, processing, and storing Mixpanel analytics data to SQLite

I'm a self-taught Python programmer and I never really learned the fundamentals of programming, so I want to see how to improve upon this script and make it adhere to best practices. The script has ...
4
votes
0answers
128 views

Parsing URLs in Pandas DataFrame

My client needs their Google AdWords destination URL query parsed and the values spell checked to eliminate any typos ("use" instead of "us", etc). I'm pulling the data using the AdWords API and ...
1
vote
1answer
25 views

Excel Laboratory Data Entry from Python 2.7

I've written a script to automate the entry of laboratory instrument data into an Excel spreadsheet using pandas and win32com. I've got the script functioning correctly, but it is painfully slow. In ...
0
votes
1answer
553 views

Better implementation of Excel's SUMIFS using pandas, the Python Data Analysis Library

I've implemented Excel's SUMIFS function in Pandas using the following code. Is there a better—more Pythonic—implementation? ...
2
votes
0answers
2k views

Apriori algorithm using Pandas

I want to optimize my Apriori algorithm for speed: ...
1
vote
0answers
464 views

Inventory simulation using Pandas DataFrame

I've been learning Python like for a year and started learning a little about pandas DataFrame. I made this little program to practice all the concepts and would ...
5
votes
1answer
117 views

Reading an Excel file and comparing the amino acid sequence of each data pair

Since I am fairly new to Python I was wondering whether anyone can help me by making the code more efficient. I know the output stinks; I will be using Pandas to make this a little nicer. ...
5
votes
1answer
615 views

Efficient Pandas to MySQL “UPDATE… WHERE”

I have a pandas DataFrame and a (MySQL) database with the same columns. The database is not managed by me. I want to update the values in the database in an "UPDATE... WHERE" style, updating only ...
2
votes
2answers
239 views

Matplotlib-venn and keeping lists of the entries

Having come upon the wonderful little module of matplotlib-venn I've used it for a bit, I'm wondering if there's a nicer way of doing things than what I have done so far. I know that you can use the ...
3
votes
1answer
4k views

Working with pandas dataframes for stock backtesting exercise

I'm attempting to apply a long set of conditions and operations onto a pandas dataframe (see the dataframe below with VTI, upper, lower, etc). I attempted to use apply, but I was having a lot of ...
2
votes
1answer
82 views

SQL GROUPING SETS in Python using Pandas

The code below is intended to provide SQL's GROUPING SETS functionality in Python with the aid of Pandas. Background on SQL GROUPING SETS There are at least two advantages to doing this in Python: ...
5
votes
2answers
76 views

Speed up script that calculates distribution of every character from user input

I have a data set with close to 6 million rows of user input. Specifically, users were supposed to type in their email addresses, but because there was not pattern validation put in place we have a ...
5
votes
2answers
142 views

Monte Carlo estimation of the Hypergeometric Function

I am trying to implement the algorithm described in the paper Statistical Test for the Comparison of Samples from Mutational Spectra (Adams & Skopek, 1986) DOI: 10.1016/0022-2836(87)90669-3: $$p ...
1
vote
0answers
154 views

Speed up Pandas DataFrame expansion to include time-lagged information about events

Using pandas and Python 3, information about a simple timeseries data set is being processed. Within the span of .5 seconds, 3 names are being said. We record the onset of each utterance, the length ...
1
vote
1answer
786 views

Parse Bloomberg Excel/CSV with Pandas DataFrame

I retrieved Bloomberg data using the Excel API. In the typical fashion, the first row contains tickers in every fourth column, and the second row has the labels Date, PX_LAST, [Empty Column], Date, ...
1
vote
0answers
150 views

Speeding up filtering function in Pandas

I have a CSV file with 400 000 rows and the following headers: ...
1
vote
0answers
64 views

Generate features for future ML analysis of asset returns

I have built the following code to download stock data from Yahoo Finance. The plan is to then use the built-in pandas functions to calculate metrics from this data ...
10
votes
1answer
762 views

Simplifying Python Pandas code for selecting co-occurrences in a window of time

I am a beginner at programming. I was able to build the thing below, which achieves what I want with a small dataset. With larger datasets, my RAM gets swamped bringing the computer to a halt (2014 ...