Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

How can I put a numpy multidimensional array in a HDF5 file using PyTables?

From what I can tell I can't put an array field in a pytables table.

I also need to store some info about this array and be able to do mathematical computations on it.

Any suggestions?

share|improve this question
8  
Honestly, if you're storing a lot of just straight up ND arrays, you're better off with h5py instead of pytables. It's as simple as f.create_dataset('name', data=x) where x is your numpy array and f is the open hdf file. Doing the same thing in pytables is possible, but considerably more difficult. – Joe Kington Jan 12 '12 at 22:16
    
Joe, +1. I was about to post an almost identical comment. – Sven Marnach Jan 12 '12 at 22:21
    
I thought of that but pytables has some features (tables.expr) to do calculations directly on the arrays, can i have that with h5py ? – scripts Jan 12 '12 at 22:22
4  
@scripts - Not in the compressed, accelerated way that pytables does. (Or at least not that I know of, anyway.) pytables will also give you lots of nice querying abilities. h5py is better suited to straight-up storage and slicing of on-disk arrays (and is more pythonic, i.m.o., too). Not to plug my own answer too much, but my thoughts on the tradeoff between the two is here: stackoverflow.com/questions/7883646/… – Joe Kington Jan 12 '12 at 22:34
    
thanks for the info Joe Kington and for my case pytables is better suited because of the powerful querying techniques – scripts Jan 12 '12 at 22:43
up vote 27 down vote accepted

There may be a simpler way, but this is how you'd go about doing it, as far as I know:

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array...
f = tables.openFile('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
ds = f.createCArray(f.root, 'somename', atom, x.shape)
ds[:] = x
f.close()

If you want to specify the compression to use, have a look at tables.Filters. E.g.

import numpy as np
import tables

# Generate some data
x = np.random.random((100,100,100))

# Store "x" in a chunked array with level 5 BLOSC compression...
f = tables.openFile('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
filters = tables.Filters(complib='blosc', complevel=5)
ds = f.createCArray(f.root, 'somename', atom, x.shape, filters=filters)
ds[:] = x
f.close()

There's probably a simpler way for a lot of this... I haven't used pytables for anything other than table-like data in a long while.

share|improve this answer
    
thanks it worked flawlessly!! – scripts Jan 12 '12 at 23:32
3  
Note that this can now be done much more straightforwardly using the create_array method on file objects, as described in the section 'Creating new array objects' at pytables.github.io/usersguide/tutorials.html – Ben Allison Oct 2 '14 at 15:52

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.