Tagged Questions
0
votes
1answer
14 views
Changing a specific column name in pandas DataFrame
I was looking for an elegant way to change a specified column name in a DataFrame.
play data ...
import pandas as pd
d = {
'one': [1, 2, 3, 4, 5],
'two': [9, 8, 7, 6, 5],
...
0
votes
1answer
14 views
Pandas and sum and cum sum in same dataframe
I use the below to create a sum and a cumsum. But they are in two separate dataframes. I want all in one
asp = np.array(np.array([0,0,1]))
asq = np.array(np.array([10,10,20]))
columns=['asp']
df = ...
0
votes
1answer
25 views
Pandas (python) plot() without a legend
Using the pandas library in python and using
.plot()
on a dataframe, how do I display the plot without a legend?
2
votes
1answer
35 views
Python Pandas working with dataframes in functions
I have a DataFrame which I want to pass to a function, derive some information from and then return that information. Originally I set up my code like:
df = pd.DataFrame( {
'A': ...
0
votes
1answer
24 views
Merging Pandas DataFrames with the same column name
I have a dataset, lets say:
Column with duplicates value1 value2
1 5 0
1 0 9
And what I want
Column ...
3
votes
0answers
24 views
using pandas to parse a section inside a JSON document
I'm trying to analyze my electric bill usage (hourly data downloaded in JSON format! woot!) with pandas. I can do it, but it's klunkier than I expected:
import pandas as pd
import json
with ...
0
votes
1answer
34 views
statsmodels data update frequency [on hold]
consider a vector with hourly observations, the data is updated every 12 hours.
with R, I could do
ts(vector_with_R, frequency=12)
In statsmodels "freq" controls the units for the time series ...
1
vote
1answer
30 views
Receiving `KeyError: u'no item named XYZ'` error
Here's the error I receive
after running it I have this problem:
Traceback (most recent call last):
File "t1.py", line 255, in <module>
pivot_rating = ratings.pivot(index='User-ID', ...
1
vote
0answers
47 views
ImportError: No module named dateutil.parser
I am receiving the following error when importing pandas in a Python program
monas-mbp:book mona$ sudo pip install python-dateutil
Requirement already satisfied (use --upgrade to upgrade): ...
1
vote
2answers
24 views
Purpose of 'ax' keyword in pandas scatter_matrix function
I'm puzzled by the meaning of the 'ax' keyword in the pandas scatter_matrix function:
pd.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', ...
2
votes
1answer
20 views
Pandas Dataframe Bar Plot - Qualitative Variable?
I am looking to create a simple bar plot, where the grouped qualitative values for "source" are the x-axis and the quantiative values of "retweet_count" are the y-axis.
I want the total number of ...
1
vote
0answers
23 views
Looking for dataframe.apply() without shape restrictions
I want to do Spline interpolations separately on each column of a smaller dataframe timeseries to create a finer resolved dataframe time-series with a larger dimension than the original.
So, ideally ...
0
votes
1answer
21 views
Pandas group by will not work
what am i missing here? I am trying to do a group by.
asp = np.array(np.array([0,0,1]))
asq = np.array(np.array([10,10,20]))
columns=['asp']
df = pd.DataFrame(asp, index=None, columns=columns)
...
1
vote
1answer
30 views
pandas dataframe as field in django
I want to add pandas dataframe (or a numpy array) as a field in django model. Each model instance in django has a large size 2D array associated with it so I want to store it as numpy array or pandas ...
0
votes
2answers
45 views
How to Convert (timestamp, value) array to timeseries [on hold]
I have a rather straightforward problem I'd like to solve with more efficiency than I'm currently getting.
I have a bunch of data coming in as a set of monitoring metrics. Input data is structured ...
3
votes
1answer
34 views
unstacking data with Pandas
I have some data that I'm taking from 'long' to 'wide'. I have no problem using unstack to make the data wide, but then I end up with what looks like an index which I can't get rid of. Here's a dummy ...
2
votes
2answers
40 views
Pandas head command does not give the expected results
I can't get pandas features to work for me. Here's a simple example. I read in a kaggle data set to a data frame with the following commands:
import pandas as pd
...
1
vote
0answers
38 views
How to avoid Python/Pandas creating an index in a saved csv?
I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid this so I ...
0
votes
1answer
44 views
Series.index versus Series.index.values
The pandas series has two closely related attributes: Series.index and Series.index.values.
The first of these two returns the current index of some pandas index type. It is mutable, and can be used ...
3
votes
1answer
64 views
What is the point of .ix indexing for pandas Series
For the Series object (let's call it s), pandas offers three types of addressing.
s.iloc[] -- for integer position addressing;
s.loc[] -- for index label addressing; and
s.ix[] -- for a hybrid of ...
2
votes
0answers
55 views
Efficient way to get largest box sizes of similar elements from a 2D numpy array (or pandas dataframe) [duplicate]
Basically I have a 2 dimensional numpy array filled with boolean values.
For example:
[[0,0,0,1],
[0,0,0,1],
[1,1,1,1],
[1,1,1,1],
[1,0,0,0],
[1,0,1,1]]
I need to figure out where the largest ...
1
vote
2answers
55 views
Pandas: Assigning multiple *new* columns simultaneously
I have a DataFrame with a column containing labels for each row (in addition to some relevant data for each row). I have a dictionary with keys equal to the possible labels and values equal to ...
0
votes
2answers
46 views
Pivot a pandas dataframe and get the non-axis columns as a series
I have a data set pulled from a database using pandas.io.sql.read_frame which looks like this
Period Category Projected Actual Previous
0 2013-01 A 1214432.94 ...
1
vote
2answers
55 views
Pandas: Make a new column by linearly interpolating between existing columns
Say I have a DataFrame containing data about the temperature at various altitudes on a mountain, each sampled simultaneously once per day. The altitude of each probe is fixed (i.e. they stay constant ...
3
votes
3answers
103 views
Hourly frequency count with Python
I have this Hourly csv datas sorted like this day by day for hundreds days:
2011.05.16,00:00,1.40893
2011.05.16,01:00,1.40760
2011.05.16,02:00,1.40750
2011.05.16,03:00,1.40649
I want to make a count ...
3
votes
0answers
56 views
Matplotlib & Numpy incompatibility with some Pandas functions? - integer is required
First of all, I am using '2.7.3 | 64-bit, pandas 0.12.0, and numpy 1.8.0. I am following this tutorial on Pandas time series, however when I get to this:
...
0
votes
1answer
36 views
Getting Pandas to work
Pandas only works with my iPython notebooks, not when I try to use it on my computer in a regular script. When I try to import pandas, it says 'no module found.' I'm getting quite confused looking ...
-1
votes
1answer
60 views
Python Pandas read_excel() module not found
I'm currently trying to use pandas.read_excel() to import an .xlsx file using this basic script
import pandas as pd
x = pd.read_excel("crypt_db.xlsx", "sheet1")
and I get a module ('read_excel') ...
0
votes
2answers
39 views
Best way to auto-update Python Class Attributes
I'm a bit of a noob to classes (mostly done functional programming), so although I've used one specific method to achieve the following, if there a "best practices" that implement what I'm looking for ...
3
votes
2answers
56 views
TimeSeries with a groupby in Pandas
I would like to look at TimeSeries data for every client over various time periods in Pandas.
import pandas as pd
import numpy as np
import random
clients = np.random.randint(1, 11, size=100)
dates ...
0
votes
1answer
41 views
Strange indexing behavior in Pandas
I was thinking about a potential problem in a recent project that could be caused by a non-unique index in pandas so I started playing around with some scenarios to see what would happen. In doing ...
0
votes
1answer
31 views
Appending column totals to a Pandas DataFrame
I have a dataframe with numerical values. What is the simplest way of appending a row (with a given index value) which represents the sum of each column?
Thx
1
vote
1answer
63 views
Calculating a cumulative deviation from mean monthly value in pandas series
How would I use pandas to calculate a cumulative deviation from a mean monthly rainfall value?
I am given daily rainfall data (e.g. s, below) which I can convert to a pd.Series and resample into ...
2
votes
1answer
50 views
Reading csv containing a list in Pandas
I'm trying to read this csv into pandas
HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:57.973614']"
HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:59.237387']"
HK,"[u'5328.1', u'5329.3', '2013-12-27 ...
5
votes
3answers
190 views
Looking for a quick way to speed up my code
I am looking for a way to speed up my code. I managed to speed up most parts of my code, reducing runtime to about 10 hours, but it's still not fast enough and since I'm running out of time I'm ...
1
vote
2answers
50 views
Convert pandas.TimeSeries to R.ts
I have some pandas TimeSeries with date index:
import pandas as pd
import numpy as np
pandas_ts = pd.TimeSeries(np.random.randn(100),pd.date_range(start='2000-01-01', periods=100))
I need convert ...
2
votes
1answer
140 views
Pandas dataframe get first row of each group
I have a pandas DataFrame like following.
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
'value' : ["first","second","second","first",
...
1
vote
0answers
167 views
Optimization in Python
I am trying to optimize a portfolio in python. Using pandas I've pulled close prices from Yahoo! Finance and calculated the standard deviation of each of my assets. I've also created a dictionary that ...
1
vote
1answer
110 views
pandas bar plot xtick frequency
I want to create a simple bar chart for pandas DataFrame object. However, the xtick on the chart appears to be too granular, whereas if I change the plot to line chart, xtick is optimized for better ...
1
vote
2answers
505 views
Finding the intersection between two series in Pandas
I have two series s1 and s2 in pandas/python and want to compute the intersection i.e. where all of the values of the series are common.
How would I use the concat function to do this? I have been ...
19
votes
0answers
1k views
Memory error when using pandas read_csv
I am trying to do something fairly simple, reading a large csv file into a pandas dataframe.
This is what I am using to do this:
data = pandas.read_csv(filepath, header = 0, sep = DELIMITER,skiprows ...
9
votes
1answer
874 views
Multidimensional Scaling Fitting in Numpy, Pandas and Sklearn (ValueError)
I'm trying out multidimensional scaling with sklearn, pandas and numpy. The data file Im using has 10 numerical columns and no missing values. I am trying to take this ten dimensional data and ...
3
votes
2answers
602 views
Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone
You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while ...
2
votes
3answers
2k views
Parse a Pandas column to Datetime
I have a DataFrame with column named date. How can we convert/parse the 'date' column to a DateTime object?
I loaded the date column from a Postgresql database using sql.read_frame(). An example of ...
7
votes
2answers
1k views
Make more than one chart in same IPython Notebook cell
I have started my IPython Notebook with
ipython notebook --pylab inline
This is my code in one cell
df['korisnika'].plot()
df['osiguranika'].plot()
This is working fine, it will draw two lines, ...
0
votes
1answer
292 views
Pandas - finding the row with least value of one of the levels of a multiindex
So, I have a DataFrame with a multiindex which looks like this:
info1 info2 info3
abc-8182 2012-05-08 10:00:00 1 6.0 "yeah!"
2012-05-08 ...
6
votes
2answers
651 views
Convert pandas DateTimeIndex to Unix Time?
What is the idiomatic way of converting a pandas DateTimeIndex to (an iterable of) Unix Time?
This is probably not the way to go:
[time.mktime(t.timetuple()) for t in ...
5
votes
2answers
563 views
Formatting latex (to_latex) output
I've read about the to_latex method, but it's not clear how to use the formatters argument.
I have some numbers which are too long and some which I want thousand separators.
A side issue for the ...
82
votes
5answers
13k views
“Large data” work flows using pandas
I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible ...
6
votes
2answers
985 views
filtering grouped df in pandas
PS: cross posted on pydata mailing list..sorry I am in need of quick help.
I am creating a groupby object from a pandas df and want to select out all the groups with > 1 size. The following doesn't ...