I have two numpy arrays, looking like:
field = np.array([5,1,3,3,2,1,6])
counts = np.array([100,210,300,150,20,90,170])
They are not sorted (and shouldnt change). I now want to calculate a third array (of the same length and order) which contains the sum of the counts whenever they lie in the same field. Here the result should be:
field_counts = np.array([100,300,450,450,20,300,170])
The arrays are very long, such that iterating through it (and always looking where the corresponding partner fields are) is way too inefficient. Maybe I am just not seeing the wood for the trees... I hope someone can help me out on this!
groupby
operation, that's often a sign you should be usingpandas
instead ofnumpy
; your operation would be something likedf.groupby("field")["counts"].transform(sum)
. – DSM Mar 26 '15 at 20:53