Tagged Questions
0
votes
1answer
19 views
How do I update dataframe column values with Pandas?
So say I have a bunch of data that goes like this:
10-12-2014 3.45
10-12-2014 3.67
10-12-2014 4.0
10-12-2014 5.0
10-13-2014 6.0
10-13-2014 8.9
and so on.
I want to put this stuff into a Pandas ...
0
votes
1answer
24 views
Define two columns with one map in Pandas DataFrame
I have a function which returns a list of length 2. I would like to apply this function to one column in my dataframe and assign the result to two columns.
This actually works:
from pandas import *
...
0
votes
1answer
33 views
Calculate special correlation distance matrix faster
I would like to build a distance matrix using Pearson correlation distance.
I first tried the scipy.spatial.distance.pdist(df,'correlation') which is very fast for my 5000 rows * 20 features dataset.
...
1
vote
1answer
19 views
Build access matrix from raw data with python pandas
I have data that look like this (pandas dataframe):
| User | Resource |
|=======|========= |
| User1 | Res_1 |
| User1 | Res_8 |
| User2 | Res_1 |
| User2 | Res_2 |
| User3 | Res_8 |
...
2
votes
1answer
13 views
How to convert wide to long format with hourly values and datetime index?
I'm retrieving data from a fixed SQL schema in long format and want to convert it to wide format.
As a complication each row in the DataFrame represents the values for a product for a day. Values ...
0
votes
0answers
22 views
Is there a multiple column map function for dataframes?
In Pandas,
How can one column be derived from multiple other columns?
For example, lets say I wanted to annotate my dataset with the correct form of address for each subject.
Perhaps to label some ...
3
votes
1answer
21 views
Box Plot Trellis
Consider I have some data.
Lets say it is weather data, of rainfall and temperature for each month.
For this example, I will randomly generate is like so:
def rand_weather(n):
month = n%12+1
...
0
votes
0answers
9 views
How can I create multiple histograms with pandas?
I have a csv file with three columns: Full name, Test_A_Score, Test_B_Score. Test_A_Score and Test_B_Score range from 0-10. My aim is for every unique value of Test_A_Score to create a histogram from ...
-1
votes
1answer
19 views
Filter pandas Dataframe based on max values in a column
I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value ...
0
votes
1answer
9 views
How to feed sqlite query data to Pandas scatter_matrix
I am successfully pulling data from a Fitbit sqlite db using Python sqlite3 as follows. I want to create Pandas scatter_matrix on the data.
My code that successfully gets data is:
import ...
0
votes
0answers
19 views
Creating pandas.DataFrame from numpy.array
Here's what I'm working with:
import numpy as np
import pandas as pd
np.__version__ # '1.8.1'
pd.__version__ # '0.14.1-107-g381a289'
Here is some fake data:
foo = np.arange(5)
bar = ...
0
votes
1answer
15 views
Writing full contents of Pandas dataframe to HTML table
I am embedding links in one column of a Pandas dataframe (table, below) and writing the dataframe to hmtl.
Links in the dataframe table are formatted as shown (indexing first link in table):
In: ...
2
votes
1answer
22 views
Resolve Pandas data frame merge conflicts with a function?
Let's say I have two dataframes, which I would like to merge, but there is a conflict because rows and columns overlap. Instead of duplicating the rows, I would like to pass a function to resolve the ...
3
votes
1answer
24 views
Return output of function that takes pandas dataframe as a parameter
I have a pandas dataframe that looks like:
d = {'some_col' : ['A', 'B', 'C', 'D', 'E'],
'alert_status' : [1, 2, 0, 0, 5]}
df = pd.DataFrame(d)
Quite a few tasks at my job require the same ...
2
votes
1answer
27 views
Pandas - read_hdf or store.select returning incorrect results for a query
I have a large dataset (4 million rows, 50 columns) stored via pandas store.append. When I use either store.select or read_hdf with a query for 2 columns being greater than a certain value (i.e. "(a ...
2
votes
2answers
22 views
How to add another label-like column for a dataframe in python?
Suppose I have a dataframe like this:
id openPrice closePrice
1 10.0 13.0
2 20.0 15.0
I want to add another column called 'movement':
if ...
0
votes
1answer
22 views
Installation Package, for using Pandas
I have pretty low experience with Python, but I'm tasked with creating an installation package so that I can distribute a GUI that messes with Time Series of Stocks taken from online. Based on what ...
0
votes
2answers
28 views
Pandas dataframe merge and element-wide multiplication
I have a dataframe like
df1 = pd.DataFrame({'name':['al', 'ben', 'cary'], 'bin':[1.0, 1.0, 3.0], 'score':[40, 75, 15]})
bin name score
0 1 al 40
1 1 ben 75
2 3 cary 15
...
0
votes
1answer
20 views
Why is my pandas bar chart not symetrical
I'm running this code:
import pandas as pd
pd.Series([-0.049, 0.039, 0.002, -0.165]).plot(kind='bar')
and getting this:
Why are the bars not centered with the plot area?
-piR
Update:
import ...
0
votes
2answers
26 views
Calculate date variance based on common id
I have a large table that looks like the following:
+---+---------+----------+-------+---------+------------+
| | cust_id | order_id | quant | revenue | date |
...
0
votes
0answers
15 views
Store to HDF, can't store frequency
I have dataframe that has a custom frequency index, like so,
holidays = CustomBusinessDay(holidays=[ pnd.Timestamp(d) for d in pnd.Series.from_csv(f).values])
timestamps = pnd.date_range(s, e, ...
0
votes
1answer
24 views
.gz file to pandas DataFrame weird delimiter
I am getting a very odd result when I try to load my .gz data file.
My code is pretty simple
dt = pd.read_table(gzip.open(file.gz))
but I get a very odd delimiter. I had expected a tab ('\t') but ...
1
vote
1answer
18 views
Python Parsing HTML Table Generated by JavaScript
I'm trying to scrape a table from the NYSE website (http://www1.nyse.com/about/listed/IPO_Index.html) into a pandas dataframe. In order to do so, I have a setup like this:
def htmltodf(url):
page = ...
1
vote
1answer
39 views
Regression summary output: Order of categories
This question is about the way the result of a GLM is printed, that is, the order in which the coefficients are printed. By "order" I'm not referring to any statistical meaning of this term.
The ...
1
vote
1answer
22 views
Pandas Multiindex not working with read_csv and datetime objects
I have a problem loading a dataframe from csv when I have a multiindex with more than one date in it.
I am running the following code:
import pandas as pd
import datetime
date1 = ...
0
votes
4answers
44 views
Get the mean across multiple Pandas DataFrames
I'm generating a number of dataframes with the same shape, and I want to compare them to oneanother. I want to be able to get the mean and median accross the dataframes.
Source.0 Source.1 ...
1
vote
1answer
30 views
Using pandas to read downloaded html file
As title, I tried using read_html but give me the following error:
In [17]:temp = pd.read_html('C:/age0.html',flavor='lxml')
File "<string>", line unknown
XMLSyntaxError: htmlParseStartTag: ...
0
votes
0answers
15 views
Plot graphs in a loop from Pandas Groupby object
i'm quite new in pandas and i'm looking for a solution to plot several graphs with several series in a loop from a Groupby object.
below is the Groupby object
gp_df = ...
0
votes
1answer
14 views
pandas: filter intraday df by non-consecutive list of dates
I have dataframes of 1 minute bars going back years (the datetime is the index). I need to get a set of bars covering an irregular (non-consecutive) long list of dates.
For daily bars, I could do ...
-1
votes
0answers
17 views
Alternatives to R's forecast() and ETS() package in Python? [on hold]
I'm looking for a Python alternative to R's ETS() from forecast().
It's my understanding that ETS() is one of the best performing forecasting program and I would like to use it. However I am ...
0
votes
3answers
31 views
How to filter in NaN (pandas)?
I have a pandas dataframe (df), and I want to do something like:
newdf = df[(df.var1 == 'a') & (df.var2 == NaN)]
I've tried replacing NaN with np.NaN, or 'NaN' or 'nan' etc, but nothing ...
0
votes
1answer
21 views
Resampling over dates in both levels of a MultiIndex Pandas DataFrame
I have a pandas DataFrame with a 2-level MultiIndex. Both levels of the MultiIndex are identical date ranges, spaced daily. I want to resample the DataFrame on a weekly basis, for both levels of the ...
3
votes
1answer
29 views
Load list with date value into pandas dataframe and plot activity over time
I have some Twitter data that I would like to plot activity overtime based on the type of tweet (tweet/mention/retweet).
The data is currently loaded into a list of tuples that contains date and ...
1
vote
3answers
32 views
Aggregating unbalanced panel to time series using pandas
I have an unbalanced panel that I'm trying to aggregate up to a regular, weekly time series. The panel looks as follows:
Group Date value
A 1/1/2000 5
A 1/17/2000 ...
0
votes
1answer
35 views
Pandas: Merge array is too big, how to merge in chunks?
When trying to merge two dataframes using pandas I receive this message: "ValueError: array is too big." I estimate the merged table will have about 5 billion rows, which is probably too much for my ...
-1
votes
0answers
18 views
Boxes instead of dash in pandas plot
I am seeing the following (boxes instead of dashes) when plotting with pandas; I was wondering why this is and if there is a way to fix it? Any help would be very much appreciated.
I'm using Mint ...
1
vote
0answers
28 views
Filling in with the last entry of the group
Say I have the following dataframe:
> df
C D E
A B
bar one -1.350006 0.260339 2
three -0.236451 -0.056614 0
flux six ...
0
votes
1answer
28 views
Pandas: Adding x number of Columns together based on value (x) in another column
Test dataset:
df = pd.DataFrame({'A':[2,2,2,], 'B':[2,2,2], 'C':[2,2,2], 'Fields':[3,2,1]})
I need to add the values of 'A', 'B', and 'C' together based on the value in each row of the 'Fields' ...
0
votes
1answer
24 views
dcast replicate in python
I have a big pandas data frame something like as follows:
col1 col2 col3 col4
a d sd 2
b sd sd 2
a ds hg 3
a ew ...
2
votes
4answers
46 views
pandas: How to find the max n values for each category in a column
I have a huge municipal library catalog dataset with book title, the library it's in, the library's borough, and the number of times it was loaned out.
I want to find the top 3 most loaned books for ...
0
votes
0answers
22 views
Correct way to check if Pandas DataFrame index is a certain type (DatetimeIndex)
In the code below I want to check if the index in the dataframes is of type DatetimeIndex. Is this a correct way of doing this? Is there a better way to do this than with the if statement? It seems ...
0
votes
1answer
34 views
Matplotlib timelines
I'm looking to take a python DataFrame with a bunch of timelines in it and plot these in a single figure. The DataFrame indices are Timestamps and there's a specific column, we'll call "sequence", ...
0
votes
1answer
35 views
find numeric columns in pandas (python)
say df is a pandas DataFrame.
I would like to find all columns of numeric type.
something like:
isNumeric = is_numeric(df)
1
vote
1answer
35 views
specifying “skip NA” when calculating mean of the column in a data frame created by Pandas
I am learning Pandas package by replicating the outing from some of the R vignettes. Now I am using the dplyr package from R as an example:
...
0
votes
1answer
24 views
Pandas: convert a multiindex column headers into normal column header?
My data frame looks like this, with columns header being MultiIndex, (True, False are boolean type).
date a value
id False True
0 2013-11-26 0 ...
1
vote
1answer
27 views
Turn column into multiple columns based on number of commas within string python pandas
I have a dataframe that looks like the below.
Focus on column 9. I'd like to turn each string that comes after a comma into a new column. So in Column 9, row 4, 'Ca., Cal.' I'd like 'Ca.' to remain ...
0
votes
1answer
25 views
Multiindex in pandas pivot table
I am working on a pivot table that looks like this:
Style Site AVS End Qty. \
JP SIZE 116 120 140 ADULTS L M O ...
0
votes
1answer
25 views
Get Date of last record in Date indexed pandas dataframe
I am reading an End of Day price csv file and use the date column to index the dataframe. I want to check the date of the last record. I get the index value location, but have not figured out how to ...
1
vote
1answer
37 views
is there any quick function to do looking-back calculating in pandas dataframe?
I wanna implement a calculate method like a simple scenario:
value computed as the sum of daily data during the previous N days (set N = 3 in the following example)
Dataframe df: (df.index is ...
0
votes
2answers
25 views
Pandas DataFrame Replace every value by 1 except 0
I'm having a pandas DataFrame like following.
3,0,1,0,0
11,0,0,0,0
1,0,0,0,0
0,0,0,0,4
13,1,1,5,0
I need to replace every other value to '1' except '0'. So my expected output.
1,0,1,0,0
1,0,0,0,0
...