I am populating a numpy array with a contents from a csv file. The number of columns in the CSV file may change. I am trying to concatenate the first two string columns (date + time) into a date object, and I have found an example for this on stackoverflow. However, this example would require me to make changes to the script every time the number of columns changes.
Here is the example:
#! /usr/bin/python
# variable number of numpy array for loop arguments, but only care about the first two
import numpy as np
import csv
import os
import datetime as datetime
# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())
next(data) # eat away the first line, which is the title
header = [item.strip() for item in next(data).split(',')] # get the headers
#print header
arr = np.genfromtxt(data, delimiter=',', skiprows=1, dtype=None) #skip the unit rows
arr.dtype.names = header # assign the header to names. so we can use it to do indexing
y1 = arr['Speed'] # column headings were assigned previously by arr.dtype.names = header
# Here is an example from:
# http://stackoverflow.com/questions/7500864/python-array-of-datetime-objects-from-numpy-ndarray
date_objects = np.array([datetime.datetime.strptime(a + b, "%Y-%m-%d%H:%M")
for a,b,c in arr])
print date_objects
Question: In the for statement above that takes in a numpy array. Right now, I specify a,b,c because I have three columns, but if I ever add a fourth column, then this statement would break with ValueError: too many values to unpack, which isn't very reboust If I only care about the first two columns a and b in this case, how may I re-write this? Is there a way to say, for a,b,... in arr?
I have already tried splicing the arr to the first two columns.
# Note1: Splice fails with index error too many indices
#arr_date_time = arr[:,:2]
The workaround for the splicing error is setting dtype=object and not set the dtype.names, but I would like to have dtype.names set since it makes indexing the column more readable. See my related post Numpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names