0
votes
1answer
14 views

Changing a specific column name in pandas DataFrame

I was looking for an elegant way to change a specified column name in a DataFrame. play data ... import pandas as pd d = { 'one': [1, 2, 3, 4, 5], 'two': [9, 8, 7, 6, 5], ...
0
votes
1answer
14 views

Pandas and sum and cum sum in same dataframe

I use the below to create a sum and a cumsum. But they are in two separate dataframes. I want all in one asp = np.array(np.array([0,0,1])) asq = np.array(np.array([10,10,20])) columns=['asp'] df = ...
0
votes
1answer
25 views

Pandas (python) plot() without a legend

Using the pandas library in python and using .plot() on a dataframe, how do I display the plot without a legend?
2
votes
1answer
35 views

Python Pandas working with dataframes in functions

I have a DataFrame which I want to pass to a function, derive some information from and then return that information. Originally I set up my code like: df = pd.DataFrame( { 'A': ...
0
votes
1answer
24 views

Merging Pandas DataFrames with the same column name

I have a dataset, lets say: Column with duplicates value1 value2 1 5 0 1 0 9 And what I want Column ...
3
votes
0answers
24 views

using pandas to parse a section inside a JSON document

I'm trying to analyze my electric bill usage (hourly data downloaded in JSON format! woot!) with pandas. I can do it, but it's klunkier than I expected: import pandas as pd import json with ...
0
votes
1answer
34 views

statsmodels data update frequency [on hold]

consider a vector with hourly observations, the data is updated every 12 hours. with R, I could do ts(vector_with_R, frequency=12) In statsmodels "freq" controls the units for the time series ...
1
vote
1answer
30 views

Receiving `KeyError: u'no item named XYZ'` error

Here's the error I receive after running it I have this problem: Traceback (most recent call last): File "t1.py", line 255, in <module> pivot_rating = ratings.pivot(index='User-ID', ...
1
vote
0answers
47 views

ImportError: No module named dateutil.parser

I am receiving the following error when importing pandas in a Python program monas-mbp:book mona$ sudo pip install python-dateutil Requirement already satisfied (use --upgrade to upgrade): ...
1
vote
2answers
24 views

Purpose of 'ax' keyword in pandas scatter_matrix function

I'm puzzled by the meaning of the 'ax' keyword in the pandas scatter_matrix function: pd.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', ...
2
votes
1answer
20 views

Pandas Dataframe Bar Plot - Qualitative Variable?

I am looking to create a simple bar plot, where the grouped qualitative values for "source" are the x-axis and the quantiative values of "retweet_count" are the y-axis. I want the total number of ...
1
vote
0answers
23 views

Looking for dataframe.apply() without shape restrictions

I want to do Spline interpolations separately on each column of a smaller dataframe timeseries to create a finer resolved dataframe time-series with a larger dimension than the original. So, ideally ...
0
votes
1answer
21 views

Pandas group by will not work

what am i missing here? I am trying to do a group by. asp = np.array(np.array([0,0,1])) asq = np.array(np.array([10,10,20])) columns=['asp'] df = pd.DataFrame(asp, index=None, columns=columns) ...
1
vote
1answer
30 views

pandas dataframe as field in django

I want to add pandas dataframe (or a numpy array) as a field in django model. Each model instance in django has a large size 2D array associated with it so I want to store it as numpy array or pandas ...
0
votes
2answers
45 views

How to Convert (timestamp, value) array to timeseries [on hold]

I have a rather straightforward problem I'd like to solve with more efficiency than I'm currently getting. I have a bunch of data coming in as a set of monitoring metrics. Input data is structured ...
3
votes
1answer
34 views

unstacking data with Pandas

I have some data that I'm taking from 'long' to 'wide'. I have no problem using unstack to make the data wide, but then I end up with what looks like an index which I can't get rid of. Here's a dummy ...
2
votes
2answers
40 views

Pandas head command does not give the expected results

I can't get pandas features to work for me. Here's a simple example. I read in a kaggle data set to a data frame with the following commands: import pandas as pd ...
1
vote
0answers
38 views

How to avoid Python/Pandas creating an index in a saved csv?

I am trying to save a csv to a folder after making some edits to the file. Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid this so I ...
0
votes
1answer
44 views

Series.index versus Series.index.values

The pandas series has two closely related attributes: Series.index and Series.index.values. The first of these two returns the current index of some pandas index type. It is mutable, and can be used ...
3
votes
1answer
64 views

What is the point of .ix indexing for pandas Series

For the Series object (let's call it s), pandas offers three types of addressing. s.iloc[] -- for integer position addressing; s.loc[] -- for index label addressing; and s.ix[] -- for a hybrid of ...
2
votes
0answers
55 views

Efficient way to get largest box sizes of similar elements from a 2D numpy array (or pandas dataframe) [duplicate]

Basically I have a 2 dimensional numpy array filled with boolean values. For example: [[0,0,0,1], [0,0,0,1], [1,1,1,1], [1,1,1,1], [1,0,0,0], [1,0,1,1]] I need to figure out where the largest ...
1
vote
2answers
55 views

Pandas: Assigning multiple *new* columns simultaneously

I have a DataFrame with a column containing labels for each row (in addition to some relevant data for each row). I have a dictionary with keys equal to the possible labels and values equal to ...
0
votes
2answers
46 views

Pivot a pandas dataframe and get the non-axis columns as a series

I have a data set pulled from a database using pandas.io.sql.read_frame which looks like this Period Category Projected Actual Previous 0 2013-01 A 1214432.94 ...
1
vote
2answers
55 views

Pandas: Make a new column by linearly interpolating between existing columns

Say I have a DataFrame containing data about the temperature at various altitudes on a mountain, each sampled simultaneously once per day. The altitude of each probe is fixed (i.e. they stay constant ...
3
votes
3answers
103 views

Hourly frequency count with Python

I have this Hourly csv datas sorted like this day by day for hundreds days: 2011.05.16,00:00,1.40893 2011.05.16,01:00,1.40760 2011.05.16,02:00,1.40750 2011.05.16,03:00,1.40649 I want to make a count ...
3
votes
0answers
56 views

Matplotlib & Numpy incompatibility with some Pandas functions? - integer is required

First of all, I am using '2.7.3 | 64-bit, pandas 0.12.0, and numpy 1.8.0. I am following this tutorial on Pandas time series, however when I get to this: ...
0
votes
1answer
36 views

Getting Pandas to work

Pandas only works with my iPython notebooks, not when I try to use it on my computer in a regular script. When I try to import pandas, it says 'no module found.' I'm getting quite confused looking ...
-1
votes
1answer
60 views

Python Pandas read_excel() module not found

I'm currently trying to use pandas.read_excel() to import an .xlsx file using this basic script import pandas as pd x = pd.read_excel("crypt_db.xlsx", "sheet1") and I get a module ('read_excel') ...
0
votes
2answers
39 views

Best way to auto-update Python Class Attributes

I'm a bit of a noob to classes (mostly done functional programming), so although I've used one specific method to achieve the following, if there a "best practices" that implement what I'm looking for ...
3
votes
2answers
56 views

TimeSeries with a groupby in Pandas

I would like to look at TimeSeries data for every client over various time periods in Pandas. import pandas as pd import numpy as np import random clients = np.random.randint(1, 11, size=100) dates ...
0
votes
1answer
41 views

Strange indexing behavior in Pandas

I was thinking about a potential problem in a recent project that could be caused by a non-unique index in pandas so I started playing around with some scenarios to see what would happen. In doing ...
0
votes
1answer
31 views

Appending column totals to a Pandas DataFrame

I have a dataframe with numerical values. What is the simplest way of appending a row (with a given index value) which represents the sum of each column? Thx
1
vote
1answer
63 views

Calculating a cumulative deviation from mean monthly value in pandas series

How would I use pandas to calculate a cumulative deviation from a mean monthly rainfall value? I am given daily rainfall data (e.g. s, below) which I can convert to a pd.Series and resample into ...
2
votes
1answer
50 views

Reading csv containing a list in Pandas

I'm trying to read this csv into pandas HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:57.973614']" HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:59.237387']" HK,"[u'5328.1', u'5329.3', '2013-12-27 ...
5
votes
3answers
190 views

Looking for a quick way to speed up my code

I am looking for a way to speed up my code. I managed to speed up most parts of my code, reducing runtime to about 10 hours, but it's still not fast enough and since I'm running out of time I'm ...
1
vote
2answers
50 views

Convert pandas.TimeSeries to R.ts

I have some pandas TimeSeries with date index: import pandas as pd import numpy as np pandas_ts = pd.TimeSeries(np.random.randn(100),pd.date_range(start='2000-01-01', periods=100)) I need convert ...
2
votes
1answer
140 views

Pandas dataframe get first row of each group

I have a pandas DataFrame like following. df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 'value' : ["first","second","second","first", ...
1
vote
0answers
167 views

Optimization in Python

I am trying to optimize a portfolio in python. Using pandas I've pulled close prices from Yahoo! Finance and calculated the standard deviation of each of my assets. I've also created a dictionary that ...
1
vote
1answer
110 views

pandas bar plot xtick frequency

I want to create a simple bar chart for pandas DataFrame object. However, the xtick on the chart appears to be too granular, whereas if I change the plot to line chart, xtick is optimized for better ...
1
vote
2answers
505 views

Finding the intersection between two series in Pandas

I have two series s1 and s2 in pandas/python and want to compute the intersection i.e. where all of the values of the series are common. How would I use the concat function to do this? I have been ...
19
votes
0answers
1k views

Memory error when using pandas read_csv

I am trying to do something fairly simple, reading a large csv file into a pandas dataframe. This is what I am using to do this: data = pandas.read_csv(filepath, header = 0, sep = DELIMITER,skiprows ...
9
votes
1answer
874 views

Multidimensional Scaling Fitting in Numpy, Pandas and Sklearn (ValueError)

I'm trying out multidimensional scaling with sklearn, pandas and numpy. The data file Im using has 10 numerical columns and no missing values. I am trying to take this ten dimensional data and ...
3
votes
2answers
602 views

Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while ...
2
votes
3answers
2k views

Parse a Pandas column to Datetime

I have a DataFrame with column named date. How can we convert/parse the 'date' column to a DateTime object? I loaded the date column from a Postgresql database using sql.read_frame(). An example of ...
7
votes
2answers
1k views

Make more than one chart in same IPython Notebook cell

I have started my IPython Notebook with ipython notebook --pylab inline This is my code in one cell df['korisnika'].plot() df['osiguranika'].plot() This is working fine, it will draw two lines, ...
0
votes
1answer
292 views

Pandas - finding the row with least value of one of the levels of a multiindex

So, I have a DataFrame with a multiindex which looks like this: info1 info2 info3 abc-8182 2012-05-08 10:00:00 1 6.0 "yeah!" 2012-05-08 ...
6
votes
2answers
651 views

Convert pandas DateTimeIndex to Unix Time?

What is the idiomatic way of converting a pandas DateTimeIndex to (an iterable of) Unix Time? This is probably not the way to go: [time.mktime(t.timetuple()) for t in ...
5
votes
2answers
563 views

Formatting latex (to_latex) output

I've read about the to_latex method, but it's not clear how to use the formatters argument. I have some numbers which are too long and some which I want thousand separators. A side issue for the ...
82
votes
5answers
13k views

“Large data” work flows using pandas

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible ...
6
votes
2answers
985 views

filtering grouped df in pandas

PS: cross posted on pydata mailing list..sorry I am in need of quick help. I am creating a groupby object from a pandas df and want to select out all the groups with > 1 size. The following doesn't ...

15 30 50 per page