2
votes
1answer
15 views

How to change the x-axis when plotting groups from a pandas groupby combined in one plot

I am processing a chatlog and my data consists of timestamps, usernames and messages. My goal is to plot the number of messages per month for several users, so that I can compare when users were ...
1
vote
2answers
25 views

Rename pandas columns with datetime objects

I have a dataframe with unhelpful column names that I'd like to turn into datetimes. The current column names are Index([Market Median, Market Median, Market Median, Market Median, Market Median, ...
0
votes
2answers
23 views

Extracting rows for a Pandas dataframe in Python

I have imported a simple query log into a pandas dataframe in Python (see image), and would like to know what the most efficient way is to extract all of the rows that contain any given keyword that ...
0
votes
0answers
32 views

Python 2.7 - statsmodels - formatting and writing summary output

I'm doing logistic regression using pandas 0.11.0(data handling) and statsmodels 0.4.3 to do the actual regression, on Mac OSX Lion. I'm going to be running ~2,900 different logistic regression ...
1
vote
1answer
24 views

Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

I'm running Python 2.7 with the Pandas 0.11.0 library installed. I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution. ...
1
vote
1answer
33 views

Repeated measures transform in Pandas

Let's say I have data set from a repeated measures study, which looks like this: control dose_high dose_low gender participant 0 4 6 4 m 1 1 3 ...
0
votes
1answer
50 views

Reference previous row when iterating through dataframe

Is there a simple way to reference the previous row when iterating through a dataframe? In the following dataframe I would like column B to change to 1 when A > 1 and remain at 1 until A < -1, ...
1
vote
2answers
34 views

getting specific median from data

I have a DataFrame with columns time, latitude, and longitude. It looks like this: >>> df.head() time latitude longitude 0 2011-12-16 08:09:07 42.386391 -71.013544 1 ...
1
vote
1answer
48 views

Groupby Clause in Pandas

I am trying to find the GroupBy Clause (in PANDAS DATAFRAME) which can do following things. InPlace Transformation. Add All the Money If possible then to get the Original Dataframe with Columns "A" ...
2
votes
1answer
45 views

Assign to selection in pandas

I have a pandas dataframe and I want to create a new column, that is computed differently for different groups of rows. Here is a quick example: import pandas as pd data = {'foo': list('aaade'), ...
0
votes
1answer
25 views

how to get the average of dataframe column values

A B DATE 2013-05-01 473077 71333 2013-05-02 35131 62441 2013-05-03 727 27381 2013-05-04 481 1206 2013-05-05 ...
0
votes
1answer
26 views

Using boolean masks in Pandas

This is probably a trivial query but I can't work it out. Essentially, I want to be able to filter out noisy tweets from a dataframe below <class 'pandas.core.frame.DataFrame'> Int64Index: ...
1
vote
1answer
40 views

Pandas group by operations on a data frame

I have a pandas data frame like the one below. UsrId JobNos 1 4 1 56 2 23 2 55 2 41 2 5 3 78 1 25 3 1 I group by the data frame ...
0
votes
0answers
38 views

pandas dataframe.drop(col,axis=1) does not drop column from column.levels in multiindex dataframe

I have a multiindex dataframe from which I am dropping columns using df.drop(col,axis=1). Then, I am looking through column.levels[0] and doing some operations on all the columns. However, when I try ...
1
vote
1answer
29 views

Pandas: Create new dataframe that averages duplicates from another dataframe

Say I have a dataframe my_df with column duplicates, e..g foo bar foo hello 0 1 1 5 1 1 2 5 2 1 3 5 I would like to create another dataframe that averages the duplicates: foo bar ...
0
votes
1answer
29 views

MySQL `Load Data Infile Local` fails for .csv unless I open and save the file first. How can I avoid this step?

I generate .csv files using a python script by writing a pandas DataFrame to_csv, using utf8 encoding. consEx.to_csv(os.path.join(base_dir, "Database/Tables/Consumption ...
2
votes
2answers
40 views

Specifying date format when converting with pandas.to_datetime

I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y - meaning they look like: 12/01/2012 30/01/2012 The examples above represent 12 January 2012 and 30 January ...
1
vote
1answer
44 views

Pandas Panel Slicing - Improving Performance

All, I'm currently using a panel in pandas to hold my data source. My program is a simple backtesting engine. It is only for personal amusement, however, I'm getting stuck in optimizing it. The ...
1
vote
3answers
45 views

Horizontal stacked bar chart in Matplotlib

I'm trying to create a horizontal stacked bar chart using matplotlib but I can't see how to make the bars actually stack rather than all start on the y-axis. Here's my testing code. fig = ...
3
votes
0answers
48 views

Merge on single level of MultiIndex

Is there any way to merge on a single level of a MultiIndex without resetting the index? I have a "static" table of time-invariant values, indexed by an ObjectID, and I have a "dynamic" table of ...
0
votes
2answers
54 views

pandas convert strings to float for multiple columns in dataframe

I'm new to pandas and trying to figure out how to convert multiple columns which are formatted as strings to float64's. Currently I'm doing the below, but it seems like apply() or applymap() should ...
1
vote
1answer
57 views

HDF5 taking more space than CSV?

Consider the following example: Prepare the data: import string import random import pandas as pd matrix = np.random.random((100, 3000)) my_cols = [random.choice(string.ascii_uppercase) for x in ...
0
votes
0answers
20 views

Unable to save DataFrame to HDF5 (“object header message is too large”)

I have a DataFrame in Pandas: In [7]: my_df Out[7]: <class 'pandas.core.frame.DataFrame'> Int64Index: 34 entries, 0 to 0 Columns: 2661 entries, airplane to zoo dtypes: float64(2659), object(2) ...
-2
votes
2answers
96 views

Printing all values to a .txt file in Python

I wrote a small script that pulls some unnecessary columns from a text file that I'm working with. I'm not sure how to get it to print to a text file without loss of data. import pandas as pd from ...
1
vote
1answer
39 views

Iteratively writing to HDF5 Stores in Pandas

Pandas has the following examples for how to store Series, DataFrames and Panelsin HDF5 files: Prepare some data: In [1142]: store = HDFStore('store.h5') In [1143]: index = date_range('1/1/2000', ...
2
votes
1answer
135 views

Pandas: reshaping data

I have a pandas Series which presently looks like this: 14 [Yellow, Pizza, Restaurants] ... 160920 [Automotive, Auto Parts & Supplies] 160921 [Lighting Fixtures & ...
1
vote
1answer
34 views

Sublists in pandas

I've got a Pandas DataFrame, one of the columns of which looks like this: 0 {u'funny': 2, u'useful': 0, u'cool': 0} 1 {u'funny': 370, u'useful': 487, u'cool': 296} 2 ...
1
vote
1answer
31 views

pandas read_csv end of section flag

Is there a smart/easy way to tell read_csv in pandas not to load data after a certain "end of section" flag? Or for it to stop if it gets to an empty row? data = pd.read_csv(path, **params) eos_line ...
1
vote
1answer
44 views

Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while ...
0
votes
2answers
51 views

HDF5 and SQLite. Concurrency, compression & I/O performance [closed]

I have read in different places that SQlite does not play nicely with NFS, in particular when you want multiple processes from different machines trying to write to the database. I need a storage ...
0
votes
1answer
47 views

Pandas: Period object to abstract from time

I have the following DataFrame: df = pd.DataFrame({ 'Trader': 'Carl Mark Carl Joe Mark Carl Max Max'.split(), 'Share': list('ABAABAAA'), 'Quantity': [5,2,5,10,1,5,2,1] }, index=[ ...
1
vote
1answer
30 views

Do non-unique indexes provide any performance advantage in pandas?

From the pandas documentation, I've gathered that unique-valued indices make certain operations efficient, and that non-unique indices are occasionally tolerated. From the outside, it doesn't look ...
1
vote
1answer
29 views

Pandas: What are the cases when count returned by DataFrame describe is a floating point

When describing my Pandas dataframe: i get the following result: Mains_1_Power Mains_2_Power count 17.000000 17.000000 mean 57.063528 200.428607 std 67.605151 ...
0
votes
1answer
30 views

Combine sparsely populated columns of the same data in pandas

I have the following dataframe and I would like to combine columns 2,3,4,5 into just one column. | 0 | 1 | 2 | 3 | 4 | 5 | +-----+-----+-----+-----+-----+-----+ | 90 | 90 | A | | ...
0
votes
1answer
75 views

Concatenating and sorting thousands of CSV files

I have thousands of csv files in disk. Each of them with a size of approximately ~10MB (~10K columns). Most of these columns hold real (float) values. I would like to create a dataframe by ...
1
vote
1answer
43 views

pandas: how to select by partial label in index

Having a series like this: ds = Series({'wikipedia':10,'wikimedia':22,'wikitravel':33,'google':40}) google 40 wikimedia 22 wikipedia 10 wikitravel 33 dtype: int64 I would like to ...
1
vote
2answers
45 views

Deleting all columns except a few python-pandas

Say I have a data table 1 2 3 4 5 6 .. n A x x x x x x .. x B x x x x x x .. x C x x x x x x .. x And I want to slim it down so that I only have, say, columns 3 ...
2
votes
3answers
52 views

Getting data from .csv file python (panda)

I am working on a python project where I have a .csv file like this. freq,ae,cl,ota 825,1,2,3 835,4,5,6 850,10,11,12 880,22,23,24 910,46,47,48 960,94,95,96 1575,190,191,192 1710,382,383,384 ...
-2
votes
0answers
51 views

Why accessing larger index in pandas series takes longer? [closed]

I have a function that I would like to apply element-wise to several series. def my_fun(s1, s2, p1, p2, p3, angle_cutoff, s_cutoff): a1 = xy2angle(p1, s1) a2 = xy2angle(p2, s2) if ...
4
votes
2answers
69 views

Is there universal if function in numpy?

I have three series. I need to do the following operation element-wise: Compare values from the first and second series. If first is larger take arc-sinus of the element from the third series. ...
0
votes
1answer
27 views

How to define a function in pandas that takes series as an argument?

I have a data frame and I want to create a new column whose values are defined by values located in other columns (in the same row). It is very simple if I use simple operations (+, -, * and even ...
1
vote
2answers
62 views

What is the most idiomatic way to index an object with a boolean array in pandas?

I am particularly talking about Pandas version 0.11 as I am busy replacing my uses of .ix with either .loc or .iloc. I like the fact that differentiating between .loc and .iloc communicates whether I ...
1
vote
1answer
40 views

Stop pandas plot from doing new x-axis layout

I have a problem of an automatic x-axis rescaling happening when I do the following: plot column 1 plot column 1 where column 2 is notnull, but with different style. The second plot keeps ...
1
vote
1answer
37 views

Python 3.3 pandas, pip-3.3

So, I'm trying to install pandas for Python 3.3 and have been having a really hard time- between Python 2.7 and Python 3.3 and other factors. Some pertinent information: I am running Mac OSX Lion ...
0
votes
1answer
61 views

Reading Files in HDFS (Hadoop filesystem) directories into a Pandas dataframe

I am generating some delimited files from hive queries into multiple HDFS directories. As the next step, I would like to read the files into a single pandas dataframe in order to apply standard ...
1
vote
1answer
29 views

Appending to an empty data frame in Pandas?

Is it possible to append to an empty data frame that doesn't contain any indices or columns? I have tried to do this, but keep getting an empty dataframe at the end. e.g. df = pd.DataFrame() data = ...
0
votes
2answers
54 views

Having issues reading a .csv file python-pandas

I'm trying to read this .txt file in pandas and this is my result. I thought (naively) that I was getting a hang of this stuff last night, but I'm wrong apparently. If I simply run rebull = ...
2
votes
1answer
57 views

Pandas Convert 'NA' to NaN

I just picked up Pandas to do with some data analysis work in my biology research. Turns out one of the proteins I'm analyzing is called 'NA'. I have a matrix with pairwise 'HA, M1, M2, NA, NP...' on ...
0
votes
2answers
54 views

rename index of a pandas dataframe

I have a pandas dataframe whose indices look like: df.index ['a_1', 'b_2', 'c_3', ... ] I want to rename these indices to: ['a', 'b', 'c', ... ] How do I do this without specifying a dictionary ...
-2
votes
1answer
45 views

Pivoting duplicate columns into rows

This is the input file I have from reading a csv file: Sample Info D3S1358 1 D3S1358 2 TH01 1 TH01 2 D21S11 1 D21S11 2 D21S11 3 TEST_646 17 ...

1 2 3 4 5 28
15 30 50 per page