I am trying to build a dataset similar to mnist.pkl.gz provided in theano logistic_sgd.py implementation. Following is my code snippet.

import numpy as np
import csv
from PIL import Image
import gzip, cPickle
import theano
from theano import tensor as T

def load_dir_data(csv_file=""):
    print(" reading: %s" %csv_file)
    dataset=[]
    labels=[]

    cr=csv.reader(open(csv_file,"rb"))
    for row in cr:
        print row[0], row[1]
        try: 
            image=Image.open(row[0]+'.jpg').convert('LA') 
            pixels=[f[0] for f in list(image.getdata())]
            dataset.append(pixels)
            labels.append(row[1])
            del image 
        except: 
            print("image not found")
    ret_val=np.array(dataset,dtype=theano.config.floatX)
    return ret_val,np.array(labels).astype(float)   


def generate_pkl_file(csv_file=""):
    Data, y =load_dir_data(csv_file)
    train_set_x = Data[:1500]
    val_set_x = Data[1501:1750]
    test_set_x = Data[1751:1900]
    train_set_y = y[:1500]
    val_set_y = y[1501:1750]
    test_set_y = y[1751:1900]
    # Divided dataset into 3 parts. I had 2000 images.

    train_set = train_set_x, train_set_y
    val_set = val_set_x, val_set_y
    test_set = test_set_x, val_set_y

    dataset = [train_set, val_set, test_set]

    f = gzip.open('file.pkl.gz','wb')
    cPickle.dump(dataset, f, protocol=2)
    f.close()    


if __name__=='__main__':
    generate_pkl_file("trainLabels.csv") 

Error Message: Traceback (most recent call last):

  File "convert_dataset_pkl_file.py", line 50, in <module>
    generate_pkl_file("trainLabels.csv") 
  File "convert_dataset_pkl_file.py", line 29, in generate_pkl_file
    Data, y =load_dir_data(csv_file)
  File "convert_dataset_pkl_file.py", line 24, in load_dir_data
    ret_val=np.array(dataset,dtype=theano.config.floatX)
ValueError: setting an array element with a sequence.

csv file contains two fields.. image name, classification label when is run this in python interpreter, it seems to be working for me.. as follows.. I dont get error saying setting an array element with a sequence here..

---------python interpreter output----------

image=Image.open('sample.jpg').convert('LA')
pixels=[f[0] for f in list(image.getdata())]
dataset=[]
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
b=numpy.array(dataset,dtype=theano.config.floatX)
b
array([[ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.]])

Even though i am running same set of instruction (logically), when i run sample.py, i get valueError: setting an array element with a sequence.. I trying to understand this behavior.. any help would be great..

share|improve this question
1  
Please always include the full error traceback in your question. – cel May 29 '15 at 5:36
    
Don't just tell us the error. Show us where it occurred. – hpaulj May 29 '15 at 5:48
    
made edits..I tried with gdb. But there was no stack – ssh99 May 29 '15 at 5:49
    
What does dataset look like - in both cases. We don't need all the values, but enough to see if there is a difference. – hpaulj May 29 '15 at 6:59
    
image,level sample,2 10_left,0 10_right,0 13_left,0 13_right,0 15_left,1 15_right,2 16_left,4 16_right,4 It is csv file with only two entries per line.. If a load images individually in the interpreter and append pixels, i can perform np.array(dataset,dtype=theano.config.floatX).. But no when i run it in file.. – ssh99 May 29 '15 at 7:02
up vote 3 down vote accepted

The problem is probably similar to that of this question.

You're trying to create a matrix of pixel values with a row per image. But each image has a different size so the number of pixels in each row is different.

You can't create a "jagged" float typed array in numpy -- every row must be of the same length.

You'll need to pad each row to the length of the largest image.

share|improve this answer
    
P.S. if you had just searched for 'numpy "setting an array element with a sequence"' the very first result (I see) in Google is the StackOverflow question I linked to. – Daniel Renshaw May 29 '15 at 8:22
    
No.. All images are of same size.. Its working for me in python interpreter.. Doesnt when i run it as file.. – ssh99 May 29 '15 at 8:53
    
Your Python interpreter example shows the same image being added many times, not many different images. – Daniel Renshaw May 29 '15 at 8:57
    
sorry my bad.. I have two different image sizes.. I will change and will check it.. Thanks – ssh99 May 29 '15 at 9:03
    
Worked thanks.. – ssh99 May 29 '15 at 9:22

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.