I have been trying to implement some modification to speed up this pseudo code:

>>> A=np.array([1,1,1,2,2,2,3,3,3])
>>> B=np.array([np.power(A,n) for n in [3,4,5]])
>>> B
array([[  1,   1,   1,   8,   8,   8,  27,  27,  27],
       [  1,   1,   1,  16,  16,  16,  81,  81,  81],
       [  1,   1,   1,  32,  32,  32, 243, 243, 243]])

Where elements of A are often repeated 10-20 times and the shape of B needs to be retained because it is multiplied by another array of the same shape later.

My first idea was to use the following code:

uA=np.unique(A)
uB=np.array([np.power(uA,n) for n in [3,4,5]])
B=[]
for num in range(uB.shape[0]):
    Temp=np.copy(A)
    for k,v in zip(uA,uB[num]): Temp[A==k] = v
    B.append(Temp)
B=np.array(B)
### Also any better way to create the numpy array B?

This seems fairly terrible and there is likely a better way. Any idea on how to speed this up would be much appreciated.

Thank you for your time.

Here is an update. I realized that my function was poorly coded. A thank you to everyone for the suggestions. I will try to rephrase my questions better in the future so that they show everything required.

Normal='''
import numpy as np
import scipy
def func(value,n):
    if n==0: return 1
    else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1)
A=np.random.randint(10,size=250)
A=np.unique(A)
B=np.array([func(A,n) for n in [6,8,10]])
'''

Me='''
import numpy as np
import scipy
def func(value,n):
    if n==0: return 1
    else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1)
A=np.random.randint(10,size=250)
uA=np.unique(A)
uB=np.array([func(A,n) for n in [6,8,10]])
B=[]
for num in range(uB.shape[0]):
    Temp=np.copy(A)
    for k,v in zip(uA,uB[num]): Temp[A==k] = v
    B.append(Temp)
B=np.array(B)
'''


Alex='''
import numpy as np
import scipy
A=np.random.randint(10,size=250)
power=np.arange(11)
fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1)
power=np.power(A,np.arange(11).reshape(-1,1))
value=power/fact
six=np.sum(value[:6],axis=0)
eight=six+np.sum(value[6:8],axis=0)
ten=eight+np.sum(value[8:],axis=0)
B=np.vstack((six,eight,ten))
'''
Alex='''
import numpy as np
import scipy
A=np.random.randint(10,size=250)
power=np.arange(11)
fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1)
power=np.power(A,np.arange(11).reshape(-1,1))
value=power/fact
six=np.sum(value[:6],axis=0)
eight=six+np.sum(value[6:8],axis=0)
ten=eight+np.sum(value[8:],axis=0)
B=np.vstack((six,eight,ten))
'''

Alex2='''
import numpy as np
import scipy
def find_count(the_list):
    count = list(the_list).count
    result = [count(item) for item in set(the_list)]
    return result
A=np.random.randint(10,size=250)
A_unique=np.unique(A)
A_counts = np.array(find_count(A_unique))
fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1)
power=np.power(A_unique,np.arange(11).reshape(-1,1))
value=power/fact
six=np.sum(value[:6],axis=0)
eight=six+np.sum(value[6:8],axis=0)
ten=eight+np.sum(value[8:],axis=0)
B_nodup=np.vstack((six,eight,ten))
B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A_unique.shape[0]) ]
B = np.hstack( B_list )
'''


print timeit.timeit(Normal, number=10000)
print timeit.timeit(Me, number=10000)
print timeit.timeit(Alex, number=10000)
print timeit.timeit(Alex2, number=10000)

Normal: 10.7544178963
Me:     23.2039361
Alex:    4.85648703575
Alex2:   4.18024992943
share|improve this question

Append is slow, so is a python list compared to ndarray. Don't you already know the size of B? Declare it before the loop and just insert values. – user948652 21 hours ago
Can you expand on this some. Do you mean np.empty(shape) then fill in the values? – Ophion 21 hours ago
Yes. If you are taking the power of each element to 3 different powers (3,4,5) in example then you know you get lengh of A * 3 elements. – user948652 21 hours ago
Updated the pseudo code so that it works. – Ophion 21 hours ago
feedback

3 Answers

up vote 1 down vote accepted

Use a combination of numpy.tile() and numpy.hstack(), as follows:

A = np.array([1,2,3])
A_counts = np.array([3,3,3])
A_powers = np.array([[3],[4],[5]])
B_nodup = np.power(A, A_powers)
B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A.shape[0]) ]
B = np.hstack( B_list )

The transpose and stack may be reversed, this may be faster:

B_list = [ np.tile( B_nodup[:,i], (A_counts[i], 1) ) for i in range(A.shape[0]) ]
B = np.transpose( np.vstack( B_list ) )

This is likely only worth doing if the function you are calculating is quite expensive, or it is duplicated many, many times (more than 10); doing a tile and stack to prevent calculating the power function an extra 10 times is likely not worth it. Please benchmark and let us know.

EDIT: Or, you could just use broadcasting to get rid of the list comprehension:

>>> A=np.array([1,1,1,2,2,2,3,3,3])
>>> B = np.power(A,[[3],[4],[5]])
>>> B
array([[  1,   1,   1,   8,   8,   8,  27,  27,  27],
       [  1,   1,   1,  16,  16,  16,  81,  81,  81],
       [  1,   1,   1,  32,  32,  32, 243, 243, 243]])

This is probably pretty fast, but doesn't actually do what you asked.

share|improve this answer
Im not sure why people are so against list comprehension. In this scenario its about the same/slightly better then these other options. – Ophion 21 hours ago
2  
@Ophion: are you sure? Please compare with the broadcasting method above, not the one suggested by larsmans below (which requires an extra reshape and transpose). I suspect broadcasting wins by a lot for large arrays (try 1k by 1k or so). – Alex I 20 hours ago
@AlexI: your version effectively reshapes [3,4,5]. When I run it on a larger example, it's a tad slower than my version, though probably not significantly so. reshape and transpose are very cheap. – larsmans 20 hours ago
@Alex: Ah yes, when there are many rows the broadcasting method above is much faster. This typically only has about three rows so the differences are fairly negligible. Its certainly interesting and I will use it in the future. For the np.tile I will give it a go; however, the order is important and not always sorted. – Ophion 20 hours ago
@Ophion: the np.tile is used to make the desired number of copies of each column in the result (given by A_counts); it doesn't require anything to be sorted. – Alex I 19 hours ago
feedback

My go at it with 200k iterations, the first method is mine.

import numpy as np
import time

N = 200000
start = time.time()
for j in range(N):

    x = np.array([1,1,1,2,2,2,3,3,3])
    powers = np.array([3,4,5])
    result = np.zeros((powers.size,x.size)).astype(np.int32)
    for i in range(powers.size):
        result[i,:] = x**powers[i]
print time.time()-start, "seconds"

start = time.time()
for j in range(N):
    A=np.array([1,1,1,2,2,2,3,3,3])
    B = np.power(A,[[3],[4],[5]])
print time.time()-start, "seconds"

start = time.time()
for j in range(N):
    np.power(A.reshape(-1,1), [3,4,5]).T
print time.time()-start, "seconds"

start = time.time()
for j in range(N):
    A=np.array([1,1,1,2,2,2,3,3,3])
    B=np.array([np.power(x,n) for n in [3,4,5]])
print time.time()-start, "seconds"

Produces

8.88000011444 seconds
9.25099992752 seconds
3.95399999619 seconds
7.43799996376 seconds

larsmans method is clearly fastest.

(ps how do you link to an answer or user here without explicit url @larsman doesnt work)

share|improve this answer
1  
You forgot to initialize A for every loop of larsmans answer. Updating for this, using timeit, and A=np.random.randint(20, size=500) initialize at the very start you will find they are very close together. Thanks for posting this. – Ophion 20 hours ago
That's @larsmans, not larsman :) – larsmans 20 hours ago
feedback

You can broadcast np.power across A if you change its shape to that of a column vector.

>>> np.power(A.reshape(-1,1), [3,4,5]).T
array([[  1,   1,   1,   8,   8,   8,  27,  27,  27],
       [  1,   1,   1,  16,  16,  16,  81,  81,  81],
       [  1,   1,   1,  32,  32,  32, 243, 243, 243]])
share|improve this answer
Any intuitive reason why (-1,1) is a coulm and (1,-1) is a row? Just trying to understand the logic. Or is it "just because"? – user948652 21 hours ago
1  
@user948652: try printing them. Or, if you're looking for a reason why the first index is the row index: that's just a convention. – larsmans 21 hours ago
Ah should have googled first, thanks anyway :) newshape : int or tuple of ints The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions. – user948652 21 hours ago
This actually takes more time then list comprehension by about 5% according to timeit. – Ophion 21 hours ago
@Ophion: on my box, it's about 20% faster for your example data. When A.size == 21 (7 distinct values), it's about twice as fast. – larsmans 20 hours ago
feedback

Your Answer

 
or
required, but never shown
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.