Dismiss
Announcing Stack Overflow Documentation

We started with Q&A. Technical documentation is next, and we need your help.

Whether you're a beginner or an experienced developer, you can contribute.

Sign up and start helping → Learn more about Documentation →

I'm trying to build a numpy array of arrays of arrays with the following code below.

Which gives me a

ValueError: setting an array element with a sequence.

My guess is that in numpy I need to declare the arrays as multi-dimensional from the beginning, but I'm not sure..

How can I fix the the code below so that I can build array of array of arrays?

from PIL import Image
import pickle
import os
import numpy

indir1 = 'PositiveResize'

trainimage = numpy.empty(2)
trainpixels = numpy.empty(80000)
trainlabels = numpy.empty(80000)
validimage = numpy.empty(2)
validpixels = numpy.empty(10000)
validlabels = numpy.empty(10000)
testimage = numpy.empty(2)
testpixels = numpy.empty(10408)
testlabels = numpy.empty(10408)

i=0
tr=0
va=0
te=0
for (root, dirs, filenames) in os.walk(indir1):
    print 'hello'
    for f in filenames:
            try:
                    im = Image.open(os.path.join(root,f))
                    Imv=im.load()
                    x,y=im.size
                    pixelv = numpy.empty(6400)
                    ind=0
                    for i in range(x):
                            for j in range(y):
                                    temp=float(Imv[j,i])
                                    temp=float(temp/255.0)
                                    pixelv[ind]=temp
                                    ind+=1
                    if i<40000:
                            trainpixels[tr]=pixelv
                            tr+=1
                    elif i<45000:
                            validpixels[va]=pixelv
                            va+=1
                    else:
                            testpixels[te]=pixelv
                            te+=1
                    print str(i)+'\t'+str(f)
                    i+=1
            except IOError:
                    continue

trainimage[0]=trainpixels
trainimage[1]=trainlabels
validimage[0]=validpixels
validimage[1]=validlabels
testimage[0]=testpixels
testimage[1]=testlabels
share|improve this question
    
are the images all the same size? if so, you can pre-declare a 3d array. if not, you can declare a 1d array of type numpy.object; and these array elements you can set with a sequence; or any python object of your choosing, by definition. – Eelco Hoogendoorn Aug 5 '14 at 16:48
up vote 1 down vote accepted

Don't try to smash your entire object into a numpy array. If you have distinct things, use a numpy array for each one then use an appropriate data structure to hold them together.

For instance, if you want to do computations across images then you probably want to just store the pixels and labels in separate arrays.

trainpixels = np.empty([10000, 80, 80])
trainlabels = np.empty(10000)
for i in range(10000):
    trainpixels[i] = ...
    trainlabels[i] = ...

To access an individual image's data:

imagepixels = trainpixels[253]
imagelabel = trainlabels[253]

And you can easily do stuff like compute summary statistics over the images.

meanimage = np.mean(trainpixels, axis=0)
meanlabel = np.mean(trainlabels)

If you really want all the data to be in the same object, you should probably use a struct array as Eelco Hoogendoorn suggests. Some example usage:

# Construction and assignment
trainimages = np.empty(10000, dtype=[('label', np.int), ('pixel', np.int, (80,80))])
for i in range(10000):
    trainimages['label'][i] = ...
    trainimages['pixel'][i] = ...

# Summary statistics
meanimage = np.mean(trainimages['pixel'], axis=0)
meanlabel = np.mean(trainimages['label'])

# Accessing a single image
image = trainimages[253]
imagepixels, imagelabel = trainimages[['pixel', 'label']][253]

Alternatively, if you want to process each one separately, you could store each image's data in separate arrays and bind them together in a tuple or dictionary, then store all of that in a list.

trainimages = []
for i in range(10000):
    pixels = ...
    label = ...
    image = (pixels, label)
    trainimages.append(image)

Now to access a single images data:

imagepixels, imagelabel = trainimages[253]

This makes it more intuitive to access a single image, but because all the data is not in one big numpy array you don't get easy access to functions that work across images.

share|improve this answer
1  
while separate arrays may be preferable, this isn't a necessity; you can also use a struct array, ie: data = np.empty(10000, dtype=[('labels', np.int), ('pixels', np.int, (80,80))]) – Eelco Hoogendoorn Aug 5 '14 at 16:46
    
@EelcoHoogendoorn Good solution as well. I always forget about struct arrays. – Roger Fan Aug 5 '14 at 16:51

Refer to the examples in numpy.empty:

>>> np.empty([2, 2])
array([[ -9.74499359e+001,   6.69583040e-309],
       [  2.13182611e-314,   3.06959433e-309]])         #random

Give your images a shape with the N dimensions:

testpixels = numpy.empty([96, 96])
share|improve this answer
    
but testpixels contains many arrays (about 10000 instances) of say [80,80] pixel matrices. How would I declare textpixels in that case? – ytrewq Aug 5 '14 at 15:52
    
testpixels = numpy.empty([80,80,10000]) – Brian Cain Aug 5 '14 at 15:52
    
testlabels on the other hand is just an array of integers (not array of arrays like testpixels). Then, if I were to have testimage to have testpixels and testlabels as its two elements, how should I define testimage? – ytrewq Aug 5 '14 at 15:54
    
You can create aliases of the views by slicing. See docs.scipy.org/doc/numpy/reference/arrays.indexing.html -- Maybe testpixels = numpy.empty([10000,80,80]); testimage = testpixels[0] – Brian Cain Aug 5 '14 at 15:57
    
sorry.. slicing looks like it's about accessing the index in different ways, but I'm not sure how that can be applied to set testimage[0] to [80,80,10000] dimensions, while setting testimage[1] to array of integers.. – ytrewq Aug 5 '14 at 16:01

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.