0
votes
1answer
19 views

How do I update dataframe column values with Pandas?

So say I have a bunch of data that goes like this: 10-12-2014 3.45 10-12-2014 3.67 10-12-2014 4.0 10-12-2014 5.0 10-13-2014 6.0 10-13-2014 8.9 and so on. I want to put this stuff into a Pandas ...
0
votes
1answer
24 views

Define two columns with one map in Pandas DataFrame

I have a function which returns a list of length 2. I would like to apply this function to one column in my dataframe and assign the result to two columns. This actually works: from pandas import * ...
0
votes
1answer
33 views

Calculate special correlation distance matrix faster

I would like to build a distance matrix using Pearson correlation distance. I first tried the scipy.spatial.distance.pdist(df,'correlation') which is very fast for my 5000 rows * 20 features dataset. ...
1
vote
1answer
19 views

Build access matrix from raw data with python pandas

I have data that look like this (pandas dataframe): | User | Resource | |=======|========= | | User1 | Res_1 | | User1 | Res_8 | | User2 | Res_1 | | User2 | Res_2 | | User3 | Res_8 | ...
2
votes
1answer
13 views

How to convert wide to long format with hourly values and datetime index?

I'm retrieving data from a fixed SQL schema in long format and want to convert it to wide format. As a complication each row in the DataFrame represents the values for a product for a day. Values ...
0
votes
0answers
22 views

Is there a multiple column map function for dataframes?

In Pandas, How can one column be derived from multiple other columns? For example, lets say I wanted to annotate my dataset with the correct form of address for each subject. Perhaps to label some ...
3
votes
1answer
21 views

Box Plot Trellis

Consider I have some data. Lets say it is weather data, of rainfall and temperature for each month. For this example, I will randomly generate is like so: def rand_weather(n): month = n%12+1 ...
0
votes
0answers
9 views

How can I create multiple histograms with pandas?

I have a csv file with three columns: Full name, Test_A_Score, Test_B_Score. Test_A_Score and Test_B_Score range from 0-10. My aim is for every unique value of Test_A_Score to create a histogram from ...
-1
votes
1answer
19 views

Filter pandas Dataframe based on max values in a column

I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value ...
0
votes
1answer
9 views

How to feed sqlite query data to Pandas scatter_matrix

I am successfully pulling data from a Fitbit sqlite db using Python sqlite3 as follows. I want to create Pandas scatter_matrix on the data. My code that successfully gets data is: import ...
0
votes
0answers
19 views

Creating pandas.DataFrame from numpy.array

Here's what I'm working with: import numpy as np import pandas as pd np.__version__ # '1.8.1' pd.__version__ # '0.14.1-107-g381a289' Here is some fake data: foo = np.arange(5) bar = ...
0
votes
1answer
15 views

Writing full contents of Pandas dataframe to HTML table

I am embedding links in one column of a Pandas dataframe (table, below) and writing the dataframe to hmtl. Links in the dataframe table are formatted as shown (indexing first link in table): In: ...
2
votes
1answer
22 views

Resolve Pandas data frame merge conflicts with a function?

Let's say I have two dataframes, which I would like to merge, but there is a conflict because rows and columns overlap. Instead of duplicating the rows, I would like to pass a function to resolve the ...
3
votes
1answer
24 views

Return output of function that takes pandas dataframe as a parameter

I have a pandas dataframe that looks like: d = {'some_col' : ['A', 'B', 'C', 'D', 'E'], 'alert_status' : [1, 2, 0, 0, 5]} df = pd.DataFrame(d) Quite a few tasks at my job require the same ...
2
votes
1answer
27 views

Pandas - read_hdf or store.select returning incorrect results for a query

I have a large dataset (4 million rows, 50 columns) stored via pandas store.append. When I use either store.select or read_hdf with a query for 2 columns being greater than a certain value (i.e. "(a ...
2
votes
2answers
22 views

How to add another label-like column for a dataframe in python?

Suppose I have a dataframe like this: id openPrice closePrice 1 10.0 13.0 2 20.0 15.0 I want to add another column called 'movement': if ...
0
votes
1answer
22 views

Installation Package, for using Pandas

I have pretty low experience with Python, but I'm tasked with creating an installation package so that I can distribute a GUI that messes with Time Series of Stocks taken from online. Based on what ...
0
votes
2answers
28 views

Pandas dataframe merge and element-wide multiplication

I have a dataframe like df1 = pd.DataFrame({'name':['al', 'ben', 'cary'], 'bin':[1.0, 1.0, 3.0], 'score':[40, 75, 15]}) bin name score 0 1 al 40 1 1 ben 75 2 3 cary 15 ...
0
votes
1answer
20 views

Why is my pandas bar chart not symetrical

I'm running this code: import pandas as pd pd.Series([-0.049, 0.039, 0.002, -0.165]).plot(kind='bar') and getting this: Why are the bars not centered with the plot area? -piR Update: import ...
0
votes
2answers
26 views

Calculate date variance based on common id

I have a large table that looks like the following: +---+---------+----------+-------+---------+------------+ | | cust_id | order_id | quant | revenue | date | ...
0
votes
0answers
15 views

Store to HDF, can't store frequency

I have dataframe that has a custom frequency index, like so, holidays = CustomBusinessDay(holidays=[ pnd.Timestamp(d) for d in pnd.Series.from_csv(f).values]) timestamps = pnd.date_range(s, e, ...
0
votes
1answer
24 views

.gz file to pandas DataFrame weird delimiter

I am getting a very odd result when I try to load my .gz data file. My code is pretty simple dt = pd.read_table(gzip.open(file.gz)) but I get a very odd delimiter. I had expected a tab ('\t') but ...
1
vote
1answer
18 views

Python Parsing HTML Table Generated by JavaScript

I'm trying to scrape a table from the NYSE website (http://www1.nyse.com/about/listed/IPO_Index.html) into a pandas dataframe. In order to do so, I have a setup like this: def htmltodf(url): page = ...
1
vote
1answer
39 views

Regression summary output: Order of categories

This question is about the way the result of a GLM is printed, that is, the order in which the coefficients are printed. By "order" I'm not referring to any statistical meaning of this term. The ...
1
vote
1answer
22 views

Pandas Multiindex not working with read_csv and datetime objects

I have a problem loading a dataframe from csv when I have a multiindex with more than one date in it. I am running the following code: import pandas as pd import datetime date1 = ...
0
votes
4answers
44 views

Get the mean across multiple Pandas DataFrames

I'm generating a number of dataframes with the same shape, and I want to compare them to oneanother. I want to be able to get the mean and median accross the dataframes. Source.0 Source.1 ...
1
vote
1answer
30 views

Using pandas to read downloaded html file

As title, I tried using read_html but give me the following error: In [17]:temp = pd.read_html('C:/age0.html',flavor='lxml') File "<string>", line unknown XMLSyntaxError: htmlParseStartTag: ...
0
votes
0answers
15 views

Plot graphs in a loop from Pandas Groupby object

i'm quite new in pandas and i'm looking for a solution to plot several graphs with several series in a loop from a Groupby object. below is the Groupby object gp_df = ...
0
votes
1answer
14 views

pandas: filter intraday df by non-consecutive list of dates

I have dataframes of 1 minute bars going back years (the datetime is the index). I need to get a set of bars covering an irregular (non-consecutive) long list of dates. For daily bars, I could do ...
-1
votes
0answers
17 views

Alternatives to R's forecast() and ETS() package in Python? [on hold]

I'm looking for a Python alternative to R's ETS() from forecast(). It's my understanding that ETS() is one of the best performing forecasting program and I would like to use it. However I am ...
0
votes
3answers
31 views

How to filter in NaN (pandas)?

I have a pandas dataframe (df), and I want to do something like: newdf = df[(df.var1 == 'a') & (df.var2 == NaN)] I've tried replacing NaN with np.NaN, or 'NaN' or 'nan' etc, but nothing ...
0
votes
1answer
21 views

Resampling over dates in both levels of a MultiIndex Pandas DataFrame

I have a pandas DataFrame with a 2-level MultiIndex. Both levels of the MultiIndex are identical date ranges, spaced daily. I want to resample the DataFrame on a weekly basis, for both levels of the ...
3
votes
1answer
29 views

Load list with date value into pandas dataframe and plot activity over time

I have some Twitter data that I would like to plot activity overtime based on the type of tweet (tweet/mention/retweet). The data is currently loaded into a list of tuples that contains date and ...
1
vote
3answers
32 views

Aggregating unbalanced panel to time series using pandas

I have an unbalanced panel that I'm trying to aggregate up to a regular, weekly time series. The panel looks as follows: Group Date value A 1/1/2000 5 A 1/17/2000 ...
0
votes
1answer
35 views

Pandas: Merge array is too big, how to merge in chunks?

When trying to merge two dataframes using pandas I receive this message: "ValueError: array is too big." I estimate the merged table will have about 5 billion rows, which is probably too much for my ...
-1
votes
0answers
18 views

Boxes instead of dash in pandas plot

I am seeing the following (boxes instead of dashes) when plotting with pandas; I was wondering why this is and if there is a way to fix it? Any help would be very much appreciated. I'm using Mint ...
1
vote
0answers
28 views

Filling in with the last entry of the group

Say I have the following dataframe: > df C D E A B bar one -1.350006 0.260339 2 three -0.236451 -0.056614 0 flux six ...
0
votes
1answer
28 views

Pandas: Adding x number of Columns together based on value (x) in another column

Test dataset: df = pd.DataFrame({'A':[2,2,2,], 'B':[2,2,2], 'C':[2,2,2], 'Fields':[3,2,1]}) I need to add the values of 'A', 'B', and 'C' together based on the value in each row of the 'Fields' ...
0
votes
1answer
24 views

dcast replicate in python

I have a big pandas data frame something like as follows: col1 col2 col3 col4 a d sd 2 b sd sd 2 a ds hg 3 a ew ...
2
votes
4answers
46 views

pandas: How to find the max n values for each category in a column

I have a huge municipal library catalog dataset with book title, the library it's in, the library's borough, and the number of times it was loaned out. I want to find the top 3 most loaned books for ...
0
votes
0answers
22 views

Correct way to check if Pandas DataFrame index is a certain type (DatetimeIndex)

In the code below I want to check if the index in the dataframes is of type DatetimeIndex. Is this a correct way of doing this? Is there a better way to do this than with the if statement? It seems ...
0
votes
1answer
34 views

Matplotlib timelines

I'm looking to take a python DataFrame with a bunch of timelines in it and plot these in a single figure. The DataFrame indices are Timestamps and there's a specific column, we'll call "sequence", ...
0
votes
1answer
35 views

find numeric columns in pandas (python)

say df is a pandas DataFrame. I would like to find all columns of numeric type. something like: isNumeric = is_numeric(df)
1
vote
1answer
35 views

specifying “skip NA” when calculating mean of the column in a data frame created by Pandas

I am learning Pandas package by replicating the outing from some of the R vignettes. Now I am using the dplyr package from R as an example: ...
0
votes
1answer
24 views

Pandas: convert a multiindex column headers into normal column header?

My data frame looks like this, with columns header being MultiIndex, (True, False are boolean type). date a value id False True 0 2013-11-26 0 ...
1
vote
1answer
27 views

Turn column into multiple columns based on number of commas within string python pandas

I have a dataframe that looks like the below. Focus on column 9. I'd like to turn each string that comes after a comma into a new column. So in Column 9, row 4, 'Ca., Cal.' I'd like 'Ca.' to remain ...
0
votes
1answer
25 views

Multiindex in pandas pivot table

I am working on a pivot table that looks like this: Style Site AVS End Qty. \ JP SIZE 116 120 140 ADULTS L M O ...
0
votes
1answer
25 views

Get Date of last record in Date indexed pandas dataframe

I am reading an End of Day price csv file and use the date column to index the dataframe. I want to check the date of the last record. I get the index value location, but have not figured out how to ...
1
vote
1answer
37 views

is there any quick function to do looking-back calculating in pandas dataframe?

I wanna implement a calculate method like a simple scenario: value computed as the sum of daily data during the previous N days (set N = 3 in the following example) Dataframe df: (df.index is ...
0
votes
2answers
25 views

Pandas DataFrame Replace every value by 1 except 0

I'm having a pandas DataFrame like following. 3,0,1,0,0 11,0,0,0,0 1,0,0,0,0 0,0,0,0,4 13,1,1,5,0 I need to replace every other value to '1' except '0'. So my expected output. 1,0,1,0,0 1,0,0,0,0 ...