First I'll show a nice solution using structured arrays. The linked documentation has lots of good information on various way to index, sort, and create them.
Lets define a subset of your data,
import numpy as np
X = np.array( [[3.4,9.13], [3.5,3.43], [3.6,2.01], [3.7,6.11],
[3.8,4.95], [3.9,7.02], [4.0,4.41]] )
I = np.array( [0,1,2,0,1,2,3], dtype=np.int32 )
Structured Array
If we make a structured array (i.e. an array of structs) from this data, the problem is trivial,
sa = np.zeros( len(X), dtype=[('I',np.int64),('X',np.float64,(2))] )
Here we've made an empty structured array. Each element of the array is a 64 bit integer and a 2 element array of 64 bit floats. The list passed to dtype
defines the struct with each tuple representing a component of the struct. The tuples contain a label, a type, and a shape. The shape part is optional and defaults to a scalar entry.
Next we fill the structured array with your data,
sa['I'] = I
sa['X'] = X
At this point you can access the records like so,
>>> sa['X'][sa['I']==2]
array([[ 3.6 , 2.01],
[ 3.9 , 7.02]])
Here we've asked for all the 'X' records and indexed them using the bool array created by the statement sa['I']==2
. The dictionary you want can then be constructed using a comprehension,
d = { i:sa['X'][sa['I']==i] for i in np.unique(sa['I']) }
Next are two solutions using standard numpy arrays. The first uses np.where
and leaves the arrays unmodified and another that involves sorting the arrays which should be faster for large I
.
Using np.where
The use of np.where
is not strictly necessary as arrays can be indexed using the bool array produced from I==I0
below, but having the actual indices as ints is useful in some circumstances.
def indexby1( X,I,I0 ):
indx = np.where( I==I0 )
sub = X[indx[0],:]
return sub
def indexby2( X,I ):
d = {}
I0max = I.max()
for I0 in range(I0max+1):
d[I0] = indexby1( X, I, I0 )
return d
d = indexby2( X, I )
Sorting and pulling out chunks
Alternatively you can use the sorting solution mentioned and just return chunks,
def order_arrays( X, I ):
indx = I.argsort()
I = I[indx]
X = [indx] # equivalent to X = X[indx,:]
return X, I
def indexby(X, I, I0=None):
if I0 == None:
d = {}
for I0 in range(I.max()+1):
d[I0] = indexby( X, I, I0 )
return d
else:
ii = I.searchsorted(I0)
ff = I.searchsorted(I0+1)
sub = X[ii:ff]
return sub
X,I = order_array( X, I )
d = indexby( X, I )
Here I've combined the two previous functions into one recursive function as you described the signature in your question. This will of course modify the original arrays.
np.where
is probably a good place to start. – Gabriel Jul 25 '14 at 23:12np.searchsorted
is another nice one – Gabriel Jul 25 '14 at 23:52pandas
(especially if you've worked with SAS/R). It's based onnumpy
. – Happy001 Jul 26 '14 at 2:09