Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Here is a small code which demonstrates the error I am getting.

import numpy as np

r=4.0
L=20.0
ratio = 4*np.pi / 3.0 * (r/L)**3

for i in range(5, 16):
    n = 10**i
    m = int(ratio * n)
    print i,n,m

    ip = np.random.random_integers(100, size=(n,3))    
    jp = np.random.random_integers(100, size=(m,3))

    a = np.expand_dims(ip, -1) == jp.T
    b = np.where( a.all(axis=1).any(axis=1) )[0]

I get the following output:

5 100000 3351
6 1000000 33510 
Traceback (most recent call last):
  File "example.py", line 16, in <module>
    b = np.where( a.all(axis=1).any(axis=1) )[0]
AttributeError: 'bool' object has no attribute 'all'

Anyone know what is going on here?

Alternatively, a reasonably quick way of indexing the location of elements of jp in ip would also work. I may go with the second solution from here

share|improve this question
1  
Where i==5 the size of array a is already on the order of 1 billion elements, where i==6 its 100 billion elements (100 GB of data if bool). Are these extremely large arrays intentional? –  Ophion Jul 26 '13 at 18:49
    
The size is intentional, they are for a quite dense computational grid. At i==6 np.expand_dims(ip, -1).nbytes == 24000000 which should be within reason, no? –  user1984528 Jul 26 '13 at 21:02
    
Yes that is reasonable, but you are broadcasting this against jp.T to give the above sizes for a. print a.nbytes/1E9 for the size in GB. –  Ophion Jul 26 '13 at 21:06
    
I see, you're correct. I think instead I will use another approach (now linked in the question) which should avoid that memory issue. Thanks –  user1984528 Jul 26 '13 at 21:14

1 Answer 1

up vote 0 down vote accepted

You are broadcasting ip against jp creating extremely large arrays. When i==6 you have a 100GB array.

A solution is to loop over the array:

for i in range(2,6):
    t=time.time()
    n = 10**i+1
    m = int(ratio * n)
    print i,n,m

    ip = np.random.random_integers(10, size=(n,3))
    jp = np.random.random_integers(10, size=(m,3))

    chunksize=10000
    if chunksize>ip.shape[0]:
        chunksize=ip.shape[0]
    span=ip.shape[0]/chunksize
    remainder=(ip.shape[0]-span*chunksize)

    out=[]
    start=0
    for n in xrange(span):
        end=start+chunksize
        a = np.expand_dims(ip[start:end], -1) == jp.T
        b = np.where( a.all(axis=1).any(axis=1) )[0]
        out.append(b+start)
        start+=chunksize

    if remainder!=0:
        a = np.expand_dims(ip[-remainder:], -1) == jp.T
        b = np.where( a.all(axis=1).any(axis=1) )[0]
        out.append(b+end)

    end=np.sort(np.concatenate(out))
    print time.time()-t,end.shape

Time is about 10 seconds for i==6 so i==7 will take about 20 minutes.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.