Tagged Questions
2
votes
1answer
15 views
How to change the x-axis when plotting groups from a pandas groupby combined in one plot
I am processing a chatlog and my data consists of timestamps, usernames and messages. My goal is to plot the number of messages per month for several users, so that I can compare when users were ...
1
vote
2answers
25 views
Rename pandas columns with datetime objects
I have a dataframe with unhelpful column names that I'd like to turn into datetimes. The current column names are
Index([Market Median, Market Median, Market Median, Market Median, Market Median, ...
0
votes
2answers
23 views
Extracting rows for a Pandas dataframe in Python
I have imported a simple query log into a pandas dataframe in Python (see image), and would like to know what the most efficient way is to extract all of the rows that contain any given keyword that ...
0
votes
0answers
32 views
Python 2.7 - statsmodels - formatting and writing summary output
I'm doing logistic regression using pandas 0.11.0(data handling) and statsmodels 0.4.3 to do the actual regression, on Mac OSX Lion.
I'm going to be running ~2,900 different logistic regression ...
1
vote
1answer
24 views
Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset
I'm running Python 2.7 with the Pandas 0.11.0 library installed.
I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.
...
1
vote
1answer
33 views
Repeated measures transform in Pandas
Let's say I have data set from a repeated measures study, which looks like this:
control dose_high dose_low gender participant
0 4 6 4 m 1
1 3 ...
0
votes
1answer
50 views
Reference previous row when iterating through dataframe
Is there a simple way to reference the previous row when iterating through a dataframe?
In the following dataframe I would like column B to change to 1 when A > 1 and remain at 1 until A < -1, ...
1
vote
2answers
34 views
getting specific median from data
I have a DataFrame with columns time, latitude, and longitude. It looks like this:
>>> df.head()
time latitude longitude
0 2011-12-16 08:09:07 42.386391 -71.013544
1 ...
1
vote
1answer
48 views
Groupby Clause in Pandas
I am trying to find the GroupBy Clause (in PANDAS DATAFRAME) which can do following things.
InPlace Transformation.
Add All the Money
If possible then to get the Original Dataframe with Columns "A" ...
2
votes
1answer
45 views
Assign to selection in pandas
I have a pandas dataframe and I want to create a new column, that is computed differently for different groups of rows. Here is a quick example:
import pandas as pd
data = {'foo': list('aaade'), ...
0
votes
1answer
25 views
how to get the average of dataframe column values
A B
DATE
2013-05-01 473077 71333
2013-05-02 35131 62441
2013-05-03 727 27381
2013-05-04 481 1206
2013-05-05 ...
0
votes
1answer
26 views
Using boolean masks in Pandas
This is probably a trivial query but I can't work it out.
Essentially, I want to be able to filter out noisy tweets from a dataframe below
<class 'pandas.core.frame.DataFrame'>
Int64Index: ...
1
vote
1answer
40 views
Pandas group by operations on a data frame
I have a pandas data frame like the one below.
UsrId JobNos
1 4
1 56
2 23
2 55
2 41
2 5
3 78
1 25
3 1
I group by the data frame ...
0
votes
0answers
38 views
pandas dataframe.drop(col,axis=1) does not drop column from column.levels in multiindex dataframe
I have a multiindex dataframe from which I am dropping columns using df.drop(col,axis=1). Then, I am looking through column.levels[0] and doing some operations on all the columns. However, when I try ...
1
vote
1answer
29 views
Pandas: Create new dataframe that averages duplicates from another dataframe
Say I have a dataframe my_df with column duplicates, e..g
foo bar foo hello
0 1 1 5
1 1 2 5
2 1 3 5
I would like to create another dataframe that averages the duplicates:
foo bar ...
0
votes
1answer
29 views
MySQL `Load Data Infile Local` fails for .csv unless I open and save the file first. How can I avoid this step?
I generate .csv files using a python script by writing a pandas DataFrame to_csv, using utf8 encoding.
consEx.to_csv(os.path.join(base_dir, "Database/Tables/Consumption ...
2
votes
2answers
40 views
Specifying date format when converting with pandas.to_datetime
I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y - meaning they look like:
12/01/2012
30/01/2012
The examples above represent 12 January 2012 and 30 January ...
1
vote
1answer
44 views
Pandas Panel Slicing - Improving Performance
All,
I'm currently using a panel in pandas to hold my data source. My program is a simple backtesting engine. It is only for personal amusement, however, I'm getting stuck in optimizing it.
The ...
1
vote
3answers
45 views
Horizontal stacked bar chart in Matplotlib
I'm trying to create a horizontal stacked bar chart using matplotlib but I can't see how to make the bars actually stack rather than all start on the y-axis.
Here's my testing code.
fig = ...
3
votes
0answers
48 views
Merge on single level of MultiIndex
Is there any way to merge on a single level of a MultiIndex without resetting the index?
I have a "static" table of time-invariant values, indexed by an ObjectID, and I have a "dynamic" table of ...
0
votes
2answers
54 views
pandas convert strings to float for multiple columns in dataframe
I'm new to pandas and trying to figure out how to convert multiple columns which are formatted as strings to float64's. Currently I'm doing the below, but it seems like apply() or applymap() should ...
1
vote
1answer
57 views
HDF5 taking more space than CSV?
Consider the following example:
Prepare the data:
import string
import random
import pandas as pd
matrix = np.random.random((100, 3000))
my_cols = [random.choice(string.ascii_uppercase) for x in ...
0
votes
0answers
20 views
Unable to save DataFrame to HDF5 (“object header message is too large”)
I have a DataFrame in Pandas:
In [7]: my_df
Out[7]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 34 entries, 0 to 0
Columns: 2661 entries, airplane to zoo
dtypes: float64(2659), object(2)
...
-2
votes
2answers
96 views
Printing all values to a .txt file in Python
I wrote a small script that pulls some unnecessary columns from a text file that I'm working with. I'm not sure how to get it to print to a text file without loss of data.
import pandas as pd
from ...
1
vote
1answer
39 views
Iteratively writing to HDF5 Stores in Pandas
Pandas has the following examples for how to store Series, DataFrames and Panelsin HDF5 files:
Prepare some data:
In [1142]: store = HDFStore('store.h5')
In [1143]: index = date_range('1/1/2000', ...
2
votes
1answer
135 views
Pandas: reshaping data
I have a pandas Series which presently looks like this:
14 [Yellow, Pizza, Restaurants]
...
160920 [Automotive, Auto Parts & Supplies]
160921 [Lighting Fixtures & ...
1
vote
1answer
34 views
Sublists in pandas
I've got a Pandas DataFrame, one of the columns of which looks like this:
0 {u'funny': 2, u'useful': 0, u'cool': 0}
1 {u'funny': 370, u'useful': 487, u'cool': 296}
2 ...
1
vote
1answer
31 views
pandas read_csv end of section flag
Is there a smart/easy way to tell read_csv in pandas not to load data after a certain "end of section" flag? Or for it to stop if it gets to an empty row?
data = pd.read_csv(path, **params)
eos_line ...
1
vote
1answer
44 views
Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone
You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while ...
0
votes
2answers
51 views
HDF5 and SQLite. Concurrency, compression & I/O performance [closed]
I have read in different places that SQlite does not play nicely with NFS, in particular when you want multiple processes from different machines trying to write to the database.
I need a storage ...
0
votes
1answer
47 views
Pandas: Period object to abstract from time
I have the following DataFrame:
df = pd.DataFrame({
'Trader': 'Carl Mark Carl Joe Mark Carl Max Max'.split(),
'Share': list('ABAABAAA'),
'Quantity': [5,2,5,10,1,5,2,1]
}, index=[
...
1
vote
1answer
30 views
Do non-unique indexes provide any performance advantage in pandas?
From the pandas documentation, I've gathered that unique-valued indices make certain operations efficient, and that non-unique indices are occasionally tolerated.
From the outside, it doesn't look ...
1
vote
1answer
29 views
Pandas: What are the cases when count returned by DataFrame describe is a floating point
When describing my Pandas dataframe: i get the following result:
Mains_1_Power Mains_2_Power
count 17.000000 17.000000
mean 57.063528 200.428607
std 67.605151 ...
0
votes
1answer
30 views
Combine sparsely populated columns of the same data in pandas
I have the following dataframe and I would like to combine columns 2,3,4,5 into just one column.
| 0 | 1 | 2 | 3 | 4 | 5 |
+-----+-----+-----+-----+-----+-----+
| 90 | 90 | A | | ...
0
votes
1answer
75 views
Concatenating and sorting thousands of CSV files
I have thousands of csv files in disk. Each of them with a size of approximately ~10MB (~10K columns). Most of these columns hold real (float) values.
I would like to create a dataframe by ...
1
vote
1answer
43 views
pandas: how to select by partial label in index
Having a series like this:
ds = Series({'wikipedia':10,'wikimedia':22,'wikitravel':33,'google':40})
google 40
wikimedia 22
wikipedia 10
wikitravel 33
dtype: int64
I would like to ...
1
vote
2answers
45 views
Deleting all columns except a few python-pandas
Say I have a data table
1 2 3 4 5 6 .. n
A x x x x x x .. x
B x x x x x x .. x
C x x x x x x .. x
And I want to slim it down so that I only have, say, columns 3 ...
2
votes
3answers
52 views
Getting data from .csv file python (panda)
I am working on a python project where I have a .csv file like this.
freq,ae,cl,ota
825,1,2,3
835,4,5,6
850,10,11,12
880,22,23,24
910,46,47,48
960,94,95,96
1575,190,191,192
1710,382,383,384
...
-2
votes
0answers
51 views
Why accessing larger index in pandas series takes longer? [closed]
I have a function that I would like to apply element-wise to several series.
def my_fun(s1, s2, p1, p2, p3, angle_cutoff, s_cutoff):
a1 = xy2angle(p1, s1)
a2 = xy2angle(p2, s2)
if ...
4
votes
2answers
69 views
Is there universal if function in numpy?
I have three series. I need to do the following operation element-wise:
Compare values from the first and second series.
If first is larger take arc-sinus of the element from the third series.
...
0
votes
1answer
27 views
How to define a function in pandas that takes series as an argument?
I have a data frame and I want to create a new column whose values are defined by values located in other columns (in the same row). It is very simple if I use simple operations (+, -, * and even ...
1
vote
2answers
62 views
What is the most idiomatic way to index an object with a boolean array in pandas?
I am particularly talking about Pandas version 0.11 as I am busy replacing my uses of .ix with either .loc or .iloc. I like the fact that differentiating between .loc and .iloc communicates whether I ...
1
vote
1answer
40 views
Stop pandas plot from doing new x-axis layout
I have a problem of an automatic x-axis rescaling happening when I do the following:
plot column 1
plot column 1 where column 2 is notnull, but with different style.
The second plot keeps ...
1
vote
1answer
37 views
Python 3.3 pandas, pip-3.3
So, I'm trying to install pandas for Python 3.3 and have been having a really hard time- between Python 2.7 and Python 3.3 and other factors.
Some pertinent information: I am running Mac OSX Lion ...
0
votes
1answer
61 views
Reading Files in HDFS (Hadoop filesystem) directories into a Pandas dataframe
I am generating some delimited files from hive queries into multiple HDFS directories. As the next step, I would like to read the files into a single pandas dataframe in order to apply standard ...
1
vote
1answer
29 views
Appending to an empty data frame in Pandas?
Is it possible to append to an empty data frame that doesn't contain any indices or columns?
I have tried to do this, but keep getting an empty dataframe at the end.
e.g.
df = pd.DataFrame()
data = ...
0
votes
2answers
54 views
Having issues reading a .csv file python-pandas
I'm trying to read this .txt file in pandas and this is my result. I thought (naively) that I was getting a hang of this stuff last night, but I'm wrong apparently. If I simply run
rebull = ...
2
votes
1answer
57 views
Pandas Convert 'NA' to NaN
I just picked up Pandas to do with some data analysis work in my biology research. Turns out one of the proteins I'm analyzing is called 'NA'.
I have a matrix with pairwise 'HA, M1, M2, NA, NP...' on ...
0
votes
2answers
54 views
rename index of a pandas dataframe
I have a pandas dataframe whose indices look like:
df.index
['a_1', 'b_2', 'c_3', ... ]
I want to rename these indices to:
['a', 'b', 'c', ... ]
How do I do this without specifying a dictionary ...
-2
votes
1answer
45 views
Pivoting duplicate columns into rows
This is the input file I have from reading a csv file:
Sample Info D3S1358 1 D3S1358 2 TH01 1 TH01 2 D21S11 1 D21S11 2 D21S11 3
TEST_646 17 ...