numpy missing attributes depending on array size

Question

Here is a small code which demonstrates the error I am getting.

import numpy as np

r=4.0
L=20.0
ratio = 4*np.pi / 3.0 * (r/L)**3

for i in range(5, 16):
    n = 10**i
    m = int(ratio * n)
    print i,n,m

    ip = np.random.random_integers(100, size=(n,3))    
    jp = np.random.random_integers(100, size=(m,3))

    a = np.expand_dims(ip, -1) == jp.T
    b = np.where( a.all(axis=1).any(axis=1) )[0]

I get the following output:

5 100000 3351
6 1000000 33510 
Traceback (most recent call last):
  File "example.py", line 16, in <module>
    b = np.where( a.all(axis=1).any(axis=1) )[0]
AttributeError: 'bool' object has no attribute 'all'

Anyone know what is going on here?

Alternatively, a reasonably quick way of indexing the location of elements of jp in ip would also work. I may go with the second solution from here

Where i==5 the size of array a is already on the order of 1 billion elements, where i==6 its 100 billion elements (100 GB of data if bool). Are these extremely large arrays intentional? — Ophion, Jul 26 '13 at 18:49
The size is intentional, they are for a quite dense computational grid. At i==6 np.expand_dims(ip, -1).nbytes == 24000000 which should be within reason, no? — user1984528, Jul 26 '13 at 21:02
Yes that is reasonable, but you are broadcasting this against jp.T to give the above sizes for a. print a.nbytes/1E9 for the size in GB. — Ophion, Jul 26 '13 at 21:06
I see, you're correct. I think instead I will use another approach (now linked in the question) which should avoid that memory issue. Thanks — user1984528, Jul 26 '13 at 21:14

Ophion · Accepted Answer · 2013-08-14 23:59:06Z

You are broadcasting ip against jp creating extremely large arrays. When i==6 you have a 100GB array.

A solution is to loop over the array:

for i in range(2,6):
    t=time.time()
    n = 10**i+1
    m = int(ratio * n)
    print i,n,m

    ip = np.random.random_integers(10, size=(n,3))
    jp = np.random.random_integers(10, size=(m,3))

    chunksize=10000
    if chunksize>ip.shape[0]:
        chunksize=ip.shape[0]
    span=ip.shape[0]/chunksize
    remainder=(ip.shape[0]-span*chunksize)

    out=[]
    start=0
    for n in xrange(span):
        end=start+chunksize
        a = np.expand_dims(ip[start:end], -1) == jp.T
        b = np.where( a.all(axis=1).any(axis=1) )[0]
        out.append(b+start)
        start+=chunksize

    if remainder!=0:
        a = np.expand_dims(ip[-remainder:], -1) == jp.T
        b = np.where( a.all(axis=1).any(axis=1) )[0]
        out.append(b+end)

    end=np.sort(np.concatenate(out))
    print time.time()-t,end.shape

Time is about 10 seconds for i==6 so i==7 will take about 20 minutes.

asked	1 year ago
viewed	170 times
active	1 year ago

current community

your communities

more stack exchange communities

numpy missing attributes depending on array size

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python numpy or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

numpy missing attributes depending on array size

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python numpy or ask your own question.

Linked

Related

Hot Network Questions