Python array of datetime objects from numpy ndarray

Question

I have numpy ndarray which contains two columns: one is date, e.g. 2011-08-04, another one is time, e.g. 19:00:00:081.

How can I combine them into one array of datetime objects? Currently, they're strings in numpy array.

What's the dtype of the array? Are the columns objects or fix-length string fields? — Sven Marnach, Sep 21 '11 at 13:53
@Sven Marnach: this is a continuation of reading ascii file... — unutbu, Sep 21 '11 at 13:59
@ykt: Can you remove the tab between 2011-08-04 and 19:00:00:08 when creating the original text file? If there is no whitespace, there is a slick way to form the right array with np.genfromtxt (without having to merge columns). — unutbu, Sep 21 '11 at 14:01
@unutbu: Unfortunately no, there're thousands of them and there're more to come! However, I would really like to have a look at your version as well. — abudis, Sep 21 '11 at 14:03
If a is your array, you can access its dtype using a.dtype. If the columns are fixed-width string columns, this would allow for a minor optimisation as we can skip the step of joining them by reinterpreting the data. This would not be possible if they are Python str objects. — Sven Marnach, Sep 21 '11 at 14:09

unutbu · Accepted Answer · 2011-09-21 19:28:52Z

If the date and time string in the example.txt data file were given as one column with no separating whitespace, then genfromtxt could convert it into a datetime object like this:

import numpy as np
import datetime as dt
def mkdate(text):
    return dt.datetime.strptime(text, '%Y-%m-%dT%H:%M:%S:%f')    
data = np.genfromtxt(
    'example.txt',
    names=('data','num','date')+tuple('col{i}'.format(i=i) for i in range(19)),
    converters={'date':mkdate},
    dtype=None)

Given example.txt as it is, you could form the desired numpy array with

import numpy as np
import datetime as dt
import csv

def mkdate(text):
    return dt.datetime.strptime(text, '%Y-%m-%d%H:%M:%S:%f')    

def using_csv(fname):
    desc=([('data', '|S4'), ('num', '<i4'), ('date', '|O4')]+
          [('col{i}'.format(i=i), '<f8') for i in range(19)])
    with open(fname,'r') as f:
        reader=csv.reader(f,delimiter='\t')
        data=np.array([tuple(row[:2]+[mkdate(''.join(row[2:4]))]+row[4:])
                       for row in reader],
                      dtype=desc)
    # print(mc.report_memory())        
    return data

Merging two columns in a numpy array can be a slow operation especially if the array is large. That's because merging, like resizing, requires allocating memory for a new array, and copying data from the original array to the new one. So I think it is worth trying to form the correct numpy array directly, instead of in stages (by forming a partially correct array and merging two columns).

By the way, I tested the above csv code versus merging two columns (below). Forming a single array from csv (above) was faster (and the memory usage was about the same):

import matplotlib.cbook as mc
import numpy as np
import datetime as dt

def using_genfromtxt(fname):
    data = np.genfromtxt(fname, dtype=None)

    orig_desc=data.dtype.descr
    view_desc=orig_desc[:2]+[('date','|S22')]+orig_desc[4:]
    new_desc=orig_desc[:2]+[('date','|O4')]+orig_desc[4:]

    newdata = np.empty(data.shape, dtype=new_desc)
    fields=data.dtype.names
    fields=fields[:2]+fields[4:]
    for field in fields:
        newdata[field] = data[field]

    newdata['date']=np.vectorize(mkdate)(data.view(view_desc)['date'])
    # print(mc.report_memory())

    return newdata  

# using_csv('example4096.txt')
# using_genfromtxt('example4096.txt')

example4096.txt is the same as example.txt, duplicated 4096 times. It's about 12K lines long.

% python -mtimeit -s'import test' 'test.using_genfromtxt("example4096.txt")'
10 loops, best of 3: 1.92 sec per loop

% python -mtimeit -s'import test' 'test.using_csv("example4096.txt")'
10 loops, best of 3: 982 msec per loop

Sven Marnach · Answer 2 · 2011-09-21 14:30:33Z

To answer the question as it is, given a two-column NumPy array a, you could do

b = numpy.array([datetime.datetime.strptime(s + t, "%Y-%m-%d%H:%M:%S:%f")
                 for s, t in a])

Since the comments indicate that the original array a is constructed using genfromtxt(), you are probably better off joining the columns in the text file and defining a suitable converter (see the converters argument to genfromtxt()).

Edit: If the columns are of types S10 and S12 respectively as indicated in the comments, you can do a minor optimisation of this code since you don't need to explicitly join the columns:

a = numpy.array([("2011-08-04", "19:00:00:081"), 
                 ("2011-08-04", "19:00:00:181")], 
                dtype=[("", "S10"), ("", "S12")])
b = numpy.array([datetime.datetime.strptime(s, "%Y-%m-%d%H:%M:%S:%f")
                 for s in a.view("S22")])

The operation a.view("S22") is cheap as it does not copy the data. If your array is really big, this optimisation might be welcome, though it does not make a huge difference.

asked	3 years ago
viewed	7496 times
active	3 years ago

current community

your communities

more stack exchange communities

Python array of datetime objects from numpy ndarray

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python datetime numpy or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Python array of datetime objects from numpy ndarray

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python datetime numpy or ask your own question.

Linked

Related

Hot Network Questions