variable number of numpy array for loop arguments required to match variable column numbers

Question

I am populating a numpy array with a contents from a csv file. The number of columns in the CSV file may change. I am trying to concatenate the first two string columns (date + time) into a date object, and I have found an example for this on stackoverflow. However, this example would require me to make changes to the script every time the number of columns changes.

Here is the example:

#! /usr/bin/python
# variable number of numpy array for loop arguments, but only care about the first two 

import numpy as np
import csv
import os
import datetime as datetime

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())

next(data)  # eat away the first line, which is the title
header = [item.strip() for item in next(data).split(',')] # get the headers
#print header
arr = np.genfromtxt(data, delimiter=',', skiprows=1, dtype=None) #skip the unit rows
arr.dtype.names = header # assign the header to names. so we can use it to do indexing

y1 = arr['Speed']   # column headings were assigned previously by arr.dtype.names = header

# Here is an example from:
# http://stackoverflow.com/questions/7500864/python-array-of-datetime-objects-from-numpy-ndarray

date_objects = np.array([datetime.datetime.strptime(a + b, "%Y-%m-%d%H:%M") 
                        for a,b,c in arr])
print date_objects

Question: In the for statement above that takes in a numpy array. Right now, I specify a,b,c because I have three columns, but if I ever add a fourth column, then this statement would break with ValueError: too many values to unpack, which isn't very reboust If I only care about the first two columns a and b in this case, how may I re-write this? Is there a way to say, for a,b,... in arr?

I have already tried splicing the arr to the first two columns.

# Note1: Splice fails with index error too many indices
#arr_date_time = arr[:,:2]

The workaround for the splicing error is setting dtype=object and not set the dtype.names, but I would like to have dtype.names set since it makes indexing the column more readable. See my related post Numpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names

Hans Then · Accepted Answer · 2013-07-19 06:58:18Z

up vote 1 down vote accepted

Try this:

date_objects = np.array([datetime.datetime.strptime(row[0] + row[1], "%Y-%m-%d%H:%M") 
                    for row in arr])

answered Jul 19 at 6:58

Hans Then
3,750125

Cool, that works, thanks. – frank Jul 23 at 1:49

add comment (requires an account with 50 reputation)

asked	17 days ago
viewed	23 times
active	17 days ago

variable number of numpy array for loop arguments required to match variable column numbers

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python numpy or ask your own question.

variable number of numpy array for loop arguments required to match variable column numbers

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python numpy or ask your own question.

Related