Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Okay I'm stumped on this I've looked at the Pandas documentation but I can't figure out the right way to do it and I think I'm just making a mess. Basically I have data which are numpy arrays e.g.

data = numpy.loadtxt('foo.txt', dtype=str,delimiter=',') 
gps_data = numpy.concatenate((data[0:len(data),0:2],data[0:len(data),3:5]),axis=1)
gps_time = data[0:len(data),2:3].astype(numpy.float)/1000

gps_data basically looks like this

array([['50.3482627', '-71.662499', '30', 'network'],
       ['50.3482588', '-71.6624934', '30', 'network'],
       ['50.34829', '-71.6625077', '30', 'network'],
       ...,
       ['20.3482488', '-78.66245463999999', '9', 'gps'],
       ['20.3482598', '-78.6625174', '30', 'network'],
       ['20.34824943', '-78.6624565', '10', 'gps']],
      dtype='|S18')

and gps_time

array([[  1.16242035e+09],
       [  1.26242036e+09],
       [  1.36242038e+09],
       ...,
       [  1.32330411e+09],
       [  1.16330413e+09],
       [  1.26330413e+09]])

What I'm trying to do is use DataFrame to bring another similar looking array called acc_data and combine it with gps_data and then go back through and fill in the different missing data times. E.g. this is what I've been trying

df1 = DataFrame(gps_data,index=gps_time,columns=['GPS'])

And it gives this error

ValueError: Shape of passed values is (4, 35047), indices imply (1, 35047)

Which I don't know how to handle, if I can find a way around that then I assume the next step df2 but for acc_data will work fine, and then I can do

p = Panel({'ACC': df1, 'GPS': df2})

Any help would be greatly appreciated been stumped on this for last few hours.

share|improve this question

2 Answers 2

up vote 2 down vote accepted

You need to make sure you pass in as many column names (using the columns keyword) as there are columns in your NumPy array:

df1 = DataFrame(gps_data, index=gps_time, columns=['col1', 'col2', 'col3', 'col4'])

Pandas raises the error because you've given it an array with four columns and it only has one column name, 'GPS', which you've specified.

share|improve this answer
    
Sweet thanks, although now when I do p = Panel({'GPS':df1,'ACC':df2}) it complains buffer has wrong number of dimensions expected 1 found 2. ? –  eWizardII Oct 6 '14 at 18:28
    
No problem. What is your df2? What shape is it? –  ajcr Oct 6 '14 at 18:33
    
df2 is [7111 rows x 3 columns] (sorry I don't know how to do formatting properly in comments) But basically df2 looks like: x y z 1.362420e+09 -0.249893 4.125504 9.105667 1.362420e+09 -2.738571 5.260941 8.285629 –  eWizardII Oct 6 '14 at 18:35
1  
@eWizardII Hmmm... I can't seem to replicate the error and I'm afraid I haven't played around with Panel a great deal. It might be a bug if you're using an older version of Pandas. If not, perhaps asking a new question is the way to go... –  ajcr Oct 6 '14 at 19:04
    
Alright will do thanks! I have version 0.14.1 on Windows which should be the latest version or close to it I believe. –  eWizardII Oct 6 '14 at 19:09

ajcr is right; the error can be avoided by specifying the right number of columns. Since gps_data has shape (35047, 4), the DataFrame has four columns. So you need columns=['col1', 'col2', 'col3', 'col4'] if you are going to specify column names.

To get gps_data in the right shape, it would also be easier to use

import numpy as np
import pandas as pd
data = np.genfromtxt('foo.txt', dtype=None, delimiter=',',
                     usecols=[0,1,2,3,4])
gps_data = data[:, [0,1,3,4]]
gps_time = data[:, 2]/1000.0

and then you can build the DataFrame with

df1 = pd.DataFrame(gps_data, index=gps_time)

Caveats:

gps_time = data[0:len(data),2:3]

makes gps_time 2-dimensional with shape (35047, 1). If you use

gps_time = data[0:len(data),2]

then gps_time will be 1-dimensional, with shape (35047,). This is more likely what you want, since the index (time) appears to be 1-dimensional.


data = numpy.loadtxt('foo.txt', dtype=str,delimiter=',')

makes all your numbers strings. If you use

np.genfromtxt('foo.txt', dtype=None, )

the dtype=None tells genfromtxt to make an intelligent guess about the type of each column -- so your float-like numbers will automatically have dtype float.

share|improve this answer
    
Alright I'll try this also - it might be the cause of the problem I just followed up too the other answer below that I get an error when using Panel. –  eWizardII Oct 6 '14 at 18:29

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.