Tagged Questions
1
vote
1answer
22 views
pandas multiple plots not working as hists
with a dataframe df in Pandas, I am trying to plot histograms on the same page filtering by 3 different variables; the intended outcome is histograms of the values for each of the three types. The ...
0
votes
1answer
42 views
Python Pandas sum up values from different columns
I'm trying to take values stored in a list in one column and multiply them by values stored in a list in another column.
For example, to print all all the cores for each user, I do this.
print ...
0
votes
1answer
42 views
Global variables for many classes vs many equivalent class attributes?
Firstly, I realize that there are already many questions about efficiency out there, so I apologize if this is a duplicate, but I'm here because I couldn't find what I was looking for. I'm going to ...
1
vote
1answer
39 views
Pandas good approach to get top n records within each group
Suppose I have pandas DataFrame like this:
>>> df = pd.DataFrame({'id':[1,1,1,2,2,2,2,3,4],'value':[1,2,3,1,2,3,4,1,1]})
>>> df
id value
0 1 1
1 1 2
2 1 3
3 ...
1
vote
1answer
32 views
Pandas dataframe get first row of each group
I have a pandas DataFrame like following.
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
'value' : ["first","second","second","first",
...
1
vote
2answers
27 views
Pandas: drop_duplicates with condition
Is there any way to use drop_duplicates together with conditions? For example, let's take the following Dataframe:
import pandas as pd
df = pd.DataFrame({
'Customer_Name': ['Carl', 'Carl', 'Mark', ...
1
vote
1answer
33 views
mapping of one column with another two
I have a Pandas Dataframe with three columns that follows this structure:
Employee email Manager
Smith [email protected] Johnson
Doe [email protected] ...
2
votes
1answer
42 views
How to calculate a new field in python using a linear relationship
I am new to python, working with python 2.7.5, After i read a csv file in python using below code:
df = csv.DictReader(open("C:\\Users\\user\\Documents\\file.csv")).
I want to calculate a new ...
1
vote
1answer
27 views
Convert one row of a pandas dataframe into multiple rows
I want to turn this:
age id val
0 99 1 0.3
1 99 2 0.5
2 99 3 0.1
Into this:
age id val
0 25 1 0.3
1 50 1 0.3
2 75 1 0.3
3 25 2 0.5
4 50 2 0.5
5 ...
3
votes
2answers
42 views
Pandas OR statement ending in series contains
I have a DataFrame df that has columns type and subtype and about 100k rows, I'm trying to classify what kind of data df contains by checking type / subtype combinations. While df can contain many ...
1
vote
1answer
23 views
Pandas CSV file with occasional extra columns in the middle
I'm processing lots (thousands) of ~100k line csv files that are produced by someone else. 9 times out of 10 the files have 8 columns and all is right with the world. The 10th time or so ~10 lines ...
1
vote
2answers
36 views
Conditional merge for CSV files using python (pandas)
I am trying to merge >=2 files with the same schema.
The files will contain duplicate entries but rows won't be identical, for example:
file1:
store_id,address,phone
9191,9827 Park st,999999999
...
0
votes
1answer
18 views
How can I select which multi-index axis splits the data in a groupby object across different subplots?
I'm working with a pandas.groupby object to which I have applied a function as such:
x = data.groupby(['congruent', 'contrast']).apply(lambda s: s.mean())[['cresp1', 'cresp2']]
Output of print x:
...
0
votes
2answers
18 views
Pandas performing a SQL subtraction between two dataframes
I have two dataframes. First there is DF1:
ID Other value
1 a
2 b
3 c
and then there is DF2, which is a subset of DF1:
ID Other value
1 a
I want ...
2
votes
1answer
35 views
PYODBC to Pandas - DataFrame not working - Shape of passed values is (x,y), indices imply (w,z)
I used pyodbc with python before but now I have installed it on a new machine ( win 8 64 bit, Python 2.7 64 bit, PythonXY with Spyder).
Before I used to (at the bottom you can find more real ...
1
vote
1answer
21 views
Difference between log(dataframe) in IPython and in execution
I have a pandas data frame as an attribute in Python 2.7, called probs. If I try to execute
log(self.prob['AAA'])
(where AAA is a valid name for one of the columns in the data frame), I get the ...
0
votes
1answer
50 views
removing NaN values in python pandas
Data is of income of adults from census data, rows look like:
31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K
48, Self-emp-not-inc, ...
1
vote
1answer
49 views
averaging every five minutes data as one datapoint in pandas dataframe
I have a Dataframe in Pandas like this
1. 2013-10-09 09:00:05
2. 2013-10-09 09:01:00
3. 2013-10-09 09:02:00
4. ............
5. ............
6. ............
7. 2013-10-10 09:15:05
8. 2013-10-10 ...
1
vote
2answers
62 views
Suppress or remove columns named 'index' from Pandas dataframe
I am trying to create a dataframe from three parent (or source) dataframes (each created from a .csv file), but when writing the resulting dataframe to a file or printing on screen, columns named ...
0
votes
1answer
43 views
iterate sort within groupby
I would like to sort this Series within each level of col_0
import pandas as pd
a = 'a b b a a a a b b'.split()
b = 'b a b b b a a b b'.split()
aS = pd.Series(a)
bS = pd.Series(b)
ctab = ...
1
vote
2answers
41 views
Pandas reindexing data frame issue
Say I have the following data frame,
A B
0 1986-87 232131
1 1987-88 564564
2 1988-89 123125
...
And so on.
I'm trying to reindex, with ...
1
vote
1answer
23 views
Pandas read_table() thousands=',' not working
I'm trying to read in some population data as an exercise to learn pandas:
>>> countries = pd.read_table('country_data.txt',
thousands=',',
...
2
votes
1answer
35 views
pandas - reading multiple JSON records into dataframe
I'd like to know if there is a memory efficient way of reading multi record JSON file ( each line is a JSON dict) into a pandas dataframe. Below is a 2 line example with working solution, I need it ...
1
vote
1answer
40 views
Filter by hour in Pandas
How can I filter a DataFrame indexed by datetime so that I get only the entries within certain hours of every day?
I am looking for something equivalent to the following R code for an xts object
...
5
votes
2answers
72 views
Insert a link inside a pandas table
I'd like to insert a link (to a web page) inside a pandas table, so when it is displayed in ipython notebook, I could press the link.
I tried the following:
In [1]: import pandas as pd
In [2]: df = ...
1
vote
2answers
37 views
Python Pandas max value of selected columns
data = {'name' : ['bill', 'joe', 'steve'],
'test1' : [85, 75, 85],
'test2' : [35, 45, 83],
'test3' : [51, 61, 45]}
frame = pd.DataFrame(data)
I would like to add a new column that shows ...
1
vote
1answer
35 views
python pandas text block to data frame mixed types
I am a python and pandas newbie. I have a text block that has data arranged in columns. The data in the first six columns are integers and the rest are floating point. I tried to create two DataFrames ...
1
vote
1answer
48 views
Append string to the start of each value in a said column of a pandas dataframe (elegantly)
I would like to append a string to the start of each value in a said column of a pandas dataframe (elegantly).
I already figured out how to kind-of do this and I am currently using:
df.ix[(df['col'] ...
0
votes
2answers
62 views
Vectorizing a Pandas dataframe for Scikit-Learn
Say I have a dataframe in Pandas like the following:
> my_dataframe
col1 col2
A foo
B bar
C something
A foo
A bar
B foo
where rows represent instances, and ...
2
votes
1answer
46 views
Interpolating a series with float index
I have the following data frame
density A2 B2
0 20 1 0.525
1 30 1 0.577
2 40 1 0.789
3 50 1 1.000
4 75 1 1.000
5 100 1 1.000
I'm trying ...
1
vote
1answer
27 views
Calculating rolling_std on 4 columns in python pandas to calculate a Bollinger Band?
I'm just getting into Pandas, trying to do what I would do in excel easily just with a large data set. I have a selection of futures price data that I have input into Pandas using:
df = ...
1
vote
1answer
25 views
Modifying number of ticks on Pandas hourly time axis
If I have the following example Python code using a Pandas dataframe:
import pandas as pd
from datetime import datetime
ts = pd.DataFrame(randn(1000), index=pd.date_range('1/1/2000 00:00:00', ...
1
vote
2answers
46 views
join three pandas data frames into one?
Here is my pandas Data Frames:
pandas1 = pandas.DataFrame([1,2,3,4,5,6,7,8,9])
pandas2 = pandas.DataFrame([10,20,30,40,50,60,70,80,90])
pandas3 = ...
2
votes
2answers
40 views
Trouble with using iloc in pandas dataframe with hierarchical index
I'm getting this ValueError whenever I try to give a list to iloc on a dataframe with a hierarchical index. I'm not sure if I'm doing something wrong or if this is a bug. I haven't had any issues ...
1
vote
1answer
36 views
Python Pandas isin return index
I have a pandas DataFrame df with a list of unique ids id, and a DataFrame with master list of all known ids master_df.id. I'm trying to figure out the best way to preform an isin that also returns to ...
1
vote
2answers
23 views
pandas timeseries identification values based on date index
I have a pandas 30min interval timeseries.
A small sample looks like:
2009-12-02 20:00:00 0.6
2009-12-02 20:30:00 0.7
2009-12-03 01:00:00 0.7
2009-12-03 02:30:00 0.7
2009-12-03 11:30:00 ...
1
vote
1answer
34 views
python pandas: How to simplify the result of groupby('column_name').count()
quick one, imaging we have a df which contains Walmart's global sales contacts, say, 20 columns. What I want to do is every simple: figure out how many rows there are for each country. Naively, I will ...
3
votes
1answer
25 views
Combine date column and time column into datetime column
I have a Pandas dataframe like this; (obtained by parsing an excel file)
| | COMPANY NAME | MEETING DATE | MEETING TIME|
...
0
votes
2answers
41 views
python datetime: How to get next period (using aliases such as 'D', 'M') [duplicate]
Is there any way to get from a date to the next period? I.e. I am looking for a funaction next that takes
now = datetime.datetime(2013, 11, 15, 0, 0)
to
next(now, 'D') = datetime.datetime(2013, ...
2
votes
2answers
69 views
Print different precision by column with pandas.DataFrame.to_csv()?
Question
Is it possible to specify a float precision specifically for each column to be printed by the Python pandas package method pandas.DataFrame.to_csv?
Background
If I have a pandas dataframe ...
0
votes
1answer
31 views
Calculate Daily Returns with Pandas DataFrame
Here is my Pandas data frame:
prices = pandas.DataFrame([1035.23, 1032.47, 1011.78, 1010.59, 1016.03, 1007.95,
1022.75, 1021.52, 1026.11, 1027.04, 1030.58, 1030.42,
...
1
vote
1answer
42 views
How to select from different columns conditionally in pandas
I have an pandas DataFrame like shaped Nx5
['','','A','','']
['','C','','','']
['','A','','','']
['','','','T','']
.
.
.
I want to convert it to Nx1 shape getting non-empty values
['A']
['C']
...
1
vote
2answers
24 views
Losing date index from dataframe in Pandas
I am trying to convert a resampled (hourly) pandas dataframe, indexed by daterun, into tuples. Here is the dataframe:
ratetype p_rate v_rate
daterun ...
1
vote
2answers
37 views
Use ternary operator in apply function in pandas dataframe, without grouping columns
How can I use ternary operator in the lambda function within apply function of pandas dataframe?
First of all, this code is from R/plyr, which is exactly what I want to get:
ddply(mtcars, .(cyl), ...
1
vote
1answer
21 views
Insert values into pandas datafrmae based on MUltiIndex
I have a MultiIndex pandas DataFrame as follows:
df = pandas.DataFrame({"index": ["a", "a", "a", "b", "b", "b"], "id": [1,2,3,4,5,6], "name": ["jim", "jim", "jim", "bob", "bob", "bob"], ...
1
vote
3answers
64 views
Run an OLS regression with Pandas Data Frame
I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example:
import pandas as pd
df = pd.DataFrame({"A": ...
0
votes
2answers
32 views
Pandas: How to access the value of the index
I have a dataframe and would like to use the values in the index to create another column.
For instance:
df=pd.DataFrame({'idx1':range(0,5), 'idx2':range(10000,10005), 'value':np.random.randn(5)})
...
1
vote
0answers
41 views
calling apply() on an empty pandas DataFrame
I'm having a problem with the apply() method of the pandas DataFrame. My issue is that apply() can return either a Series or a DataFrame, depending on the return type of the input function; however, ...
0
votes
1answer
26 views
add offset to the datetime64 column in a data frame
this is really a quick one:
i am migrating from q to pandas, i am trying to add 1 nano to each of the item in the Date column of the data frame 'spy'
>>> spy
<class ...
1
vote
0answers
30 views
automatically updating columns in pandas?
In my mind, pandas is providing me with a virtual spreadsheet, like Excel. One thing about Excel spreadsheets is that you can set a column to a function. For instance
T_c T T_r
Series ...