Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am populating a numpy array with a contents from a csv file. The number of columns in the CSV file may change. I am trying to concatenate the first two string columns (date + time) into a date object, and I have found an example for this on stackoverflow. However, this example would require me to make changes to the script every time the number of columns changes.

Here is the example:

#! /usr/bin/python
# variable number of numpy array for loop arguments, but only care about the first two 

import numpy as np
import csv
import os
import datetime as datetime

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())

next(data)  # eat away the first line, which is the title
header = [item.strip() for item in next(data).split(',')] # get the headers
#print header
arr = np.genfromtxt(data, delimiter=',', skiprows=1, dtype=None) #skip the unit rows
arr.dtype.names = header # assign the header to names. so we can use it to do indexing

y1 = arr['Speed']   # column headings were assigned previously by arr.dtype.names = header

# Here is an example from:
# http://stackoverflow.com/questions/7500864/python-array-of-datetime-objects-from-numpy-ndarray

date_objects = np.array([datetime.datetime.strptime(a + b, "%Y-%m-%d%H:%M") 
                        for a,b,c in arr])
print date_objects

Question: In the for statement above that takes in a numpy array. Right now, I specify a,b,c because I have three columns, but if I ever add a fourth column, then this statement would break with ValueError: too many values to unpack, which isn't very reboust If I only care about the first two columns a and b in this case, how may I re-write this? Is there a way to say, for a,b,... in arr?

I have already tried splicing the arr to the first two columns.

# Note1: Splice fails with index error too many indices
#arr_date_time = arr[:,:2]

The workaround for the splicing error is setting dtype=object and not set the dtype.names, but I would like to have dtype.names set since it makes indexing the column more readable. See my related post Numpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names

share|improve this question
add comment (requires an account with 50 reputation)

1 Answer

up vote 1 down vote accepted

Try this:

date_objects = np.array([datetime.datetime.strptime(row[0] + row[1], "%Y-%m-%d%H:%M") 
                    for row in arr])
share|improve this answer
Cool, that works, thanks. – frank Jul 23 at 1:49
add comment (requires an account with 50 reputation)

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.