8

I have a large 2D array that I would like to declare once, and change occasionnaly only some values depending on a parameter, without traversing the whole array.

To build this array, I have subclassed the numpy ndarray class with dtype=object and assign to the elements I want to change a function e.g. :

def f(parameter):
     return parameter**2

for i in range(np.shape(A)[0]):
    A[i,i]=f
    for j in range(np.shape(A)[0]):
        A[i,j]=1.

I have then overridden the __getitem__ method so that it returns the evaluation of the function with given parameter if it is callable, otherwise return the value itself.

    def __getitem__(self, key):
        value = super(numpy.ndarray, self).__getitem__(key)
        if callable(value):
            return value(*self.args)
        else:
            return value

where self.args were previously given to the instance of myclass.

However, I need to work with float arrays at the end, and I can't simply convert this array into a dtype=float array with this technique. I also tried to use numpy views, which does not work either for dtype=object.

Do you have any better alternative ? Should I override the view method rather than getitem ?

Edit I will maybe have to use Cython in the future, so if you have a solution involving e.g. C pointers, I am interested.

7
  • It is an interesting approach, but I'm not sure that numpy arrays are suited for it. In general when you work with numpy, you would use vectorized operations using full arrays or slices, not element by element access. Subclassing ndarrays the way you do, you essentially lose all advantage of fast numpy operations. You might be better of just creating your own class from zero ans save everything, into pure python structures (lists etc). Performance wise it's going to be comparable. Why do you really need lazy evaluation? You can change only some elements efficiently with fancy indexing.
    – rth
    Commented Jun 18, 2015 at 19:35
  • 1
    Do you only have a single function f? With constant arguments? Commented Jun 18, 2015 at 21:05
  • 2
    Are you familiar with scipy.sparse? The dok format is a dictionary, with the (i,j) tuple as keys. That and lil (list of lists) are the 2 fastest ways of accessing/changing selected items.
    – hpaulj
    Commented Jun 19, 2015 at 6:34
  • 1
    @hpaulj : dok is very interesting. However, I cannot use it with dtype=object, as in the example I showed above : github.com/scipy/scipy/issues/2528
    – Damlatien
    Commented Jun 19, 2015 at 7:46
  • 1
    @rth: The reason I need lazy evaluation rather than accessing the array with key (even efficiently), is that each affectation might be related to different kind of indices. For the example above, I only set the diagonal to be variable. I could have for instance also affected one row (or smth more complicated) to an other function g.
    – Damlatien
    Commented Jun 19, 2015 at 7:53

1 Answer 1

3

In this case, it does not make sens to bind a transformation function, to every index of your array.

Instead, a more efficient approach would be to define a transformation, as a function, together with a subset of the array it applies to. Here is a basic implementation,

import numpy as np

class LazyEvaluation(object):
    def __init__(self):
        self.transforms = []

    def add_transform(self, function, selection=slice(None), args={}):
        self.transforms.append( (function, selection, args))

    def __call__(self, x):
        y = x.copy() 
        for function, selection, args in self.transforms:
            y[selection] = function(y[selection], **args)
        return y

that can be used as follows:

x = np.ones((6, 6))*2

le = LazyEvaluation()
le.add_transform(lambda x: 0, [[3], [0]]) # equivalent to x[3,0]
le.add_transform(lambda x: x**2, (slice(4), slice(4,6)))  # equivalent to x[4,4:6]
le.add_transform(lambda x: -1,  np.diag_indices(x.shape[0], x.ndim), ) # setting the diagonal 
result =  le(x)
print(result)

which prints,

array([[-1.,  2.,  2.,  2.,  4.,  4.],
       [ 2., -1.,  2.,  2.,  4.,  4.],
       [ 2.,  2., -1.,  2.,  4.,  4.],
       [ 0.,  2.,  2., -1.,  4.,  4.],
       [ 2.,  2.,  2.,  2., -1.,  2.],
       [ 2.,  2.,  2.,  2.,  2., -1.]])

This way you can easily support all advanced Numpy indexing (element by element access, slicing, fancy indexing etc.), while at the same time keeping your data in an array with a native data type (float, int, etc) which is much more efficient than using dtype='object'.

2
  • Thanks, I have implemented this basically as a subclass of dict, and it works pretty much the way I want. However, just as a curiosity, I would like to know if it is possible to realize something similar in C/C++ with pointers in a much more elegant way ? For example, one could declare a table of (float) pointers, and at declaration time each pointer would direct to either zero, or the result of a function, so that I can update the matrix just by calling one (or several) function.
    – Damlatien
    Commented Jun 25, 2015 at 14:42
  • But to do that, reach call of a function would have to be bound to one particular pointer, and I am not sure it is doable. I hope I was clear enough.
    – Damlatien
    Commented Jun 25, 2015 at 14:42

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.