I was wondering if there was an easy way to create a class to handle both integer and keyword indexing of a numpy array of numbers.

The end goal is to have a numpy array that I can also index using the names of each variable. For example, if I have the lists

import numpy as np
a = [0,1,2,3,4]
names = ['name0','name1','name2','name3','name4']
A = np.array(a)

I would like to be able to get the values of A easily with a call of (for example) A['name1'], yet have the array retain all of the functionality of a numpy array.

Thanks!

Peter

Edit:

Thanks so much for the help, I'll try to be more clear on the intended use! I have an existing set of code which uses a numpy array to store and apply a vector of variables. My vector has around 30 entries.

When I want to see the value of a particular variable, or when I want to make a change to one of them, I have to remember which entry corresponds to which variable (the order or number of entries doesn't necessarily change once the array is created). Right now I use a dictionary to keep track. For example, I have a numpy array 'VarVector' with with 30 values. "vmax" is entry 15, with a value of 0.432. I'll then have a concurrent dictionary with 30 keys 'VarDict', such that VarDict[entry] = index. This way I can find the value of vmax by chaining the calls

VarVector[VarDict["vmax"]]

which would return 0.432

I was wondering if there would be a good way of simply combining these two structures, such that both VarVector[15] (for compatibility) and VarVector["vmax"] (for convenience to me) would point to the same number.

Thanks! Peter

link|improve this question
2  
The point of numpy arrays is that they're written in C and hence fast. If you do this you lose the benefit of numpy arrays -- you might as well use a Python list! – katrielalex Jan 17 at 22:33
Can you give a reason why you want to do this? – katrielalex Jan 17 at 22:33
1  
@katrielalex - Not necessarily... The __getitem__ of a numpy array is already quite slow. You're not going to significantly slow things down by adding this to it. However, this is a fairly common use case and has already been done a couple of times (pandas and larry). Have a look at this comparison: scipy.org/StatisticalDataStructures Having "labeled axes" or "labeled items" is a nice thing to have in some cases. – Joe Kington Jan 18 at 0:06
Fair enough, I stand corrected. Thanks =) – katrielalex Jan 18 at 0:24
feedback

3 Answers

up vote 1 down vote accepted

From your description, it sounds like you just want a structured array (which is built-in to numpy). E.g.

# Let's suppose we have 30 observations with 5 variables each...
# The five variables are temp, pressure, x-velocity, y-velocity, and z-velocity
x = np.random.random((30, 5))

# Make a structured dtype to represent our variables...
dtype=dict(names=['temp', 'pressure', 'x_vel', 'y_vel', 'z_vel'],
           formats=5 * [np.float])

# Now view "x" as a structured array with the dtype we created...
data = x.view(dtype)

# Each measurement will now have the name fields we created...
print data[0]
print data[0]['temp']

# If we want, say, all the "temp" measurements:
print data['temp']

# Or all of the "temp" and "x_vel" measurements:
print data[['temp', 'x_vel']]

Also have a look at rec arrays. They're slightly more flexible but significantly slower.

data = np.rec.fromarrays(*x, 
              names=['temp', 'pressure', 'x_vel', 'y_vel', 'z_vel'])
print data.temp

However, you'll soon hit the limitations of either of these methods (i.e. you can name both axes). In that case, have a look at larry, if you just want to label items, or pandas if you want to have labeled arrays with a lot of nice missing-value handling.

link|improve this answer
feedback

I have not tested this, but it should work.

The idea is to assume that the input is an int and use it for the numpy array, and if it isn't, use it for the dict.

import numbers
import numpy

class ThingArray:
    def __init__(self):
        self.numpy_array = numpy.array()
        self.other_array = dict()

    def __setitem__(self, key, value):
        if isinstance(key, numbers.Integral):
            self.numpy_array[key] = value
        else:
            self.other_array[key] = value

    def __getitem__(self, key):
        if isinstance(key, numbers.Integral):
            return self.numpy_array[key]
        else:
            return self.other_array[key]


thing = ThingArray()

thing[1] = 100
thing["one"] = "hundred"        

print thing[1]
print thing["one"]
link|improve this answer
feedback

You could subclass the ndarray and override the relevant methods (ie __getitem__, __setitem__, ...). More info here. This is similar to @Joe's answer, but has the advantage that it preserves almost all of the functionality of the ndarray. You obviously won't be able to do the following anymore:

In [25]: array = np.empty(3, dtype=[('char', '|S1'), ('int', np.int)])

In [26]: array['int'] = [0, 1, 2]

In [27]: array['char'] = ['a', 'b', 'c']

In [28]: array
Out[28]: 
array([('a', 0), ('b', 1), ('c', 2)], 
      dtype=[('char', '|S1'), ('int', '<i8')])

In [29]: array['char']
Out[29]: 
array(['a', 'b', 'c'], 
      dtype='|S1')

In [30]: array['int']
Out[30]: array([0, 1, 2])

If we knew why you wanted to do this, we might be able to give a more detailed answer.

link|improve this answer
feedback

Your Answer

 
or
required, but never shown
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.