3

I am trying to build a dataset similar to mnist.pkl.gz provided in theano logistic_sgd.py implementation. Following is my code snippet.

import numpy as np
import csv
from PIL import Image
import gzip, cPickle
import theano
from theano import tensor as T

def load_dir_data(csv_file=""):
    print(" reading: %s" %csv_file)
    dataset=[]
    labels=[]

    cr=csv.reader(open(csv_file,"rb"))
    for row in cr:
        print row[0], row[1]
        try: 
            image=Image.open(row[0]+'.jpg').convert('LA') 
            pixels=[f[0] for f in list(image.getdata())]
            dataset.append(pixels)
            labels.append(row[1])
            del image 
        except: 
            print("image not found")
    ret_val=np.array(dataset,dtype=theano.config.floatX)
    return ret_val,np.array(labels).astype(float)   


def generate_pkl_file(csv_file=""):
    Data, y =load_dir_data(csv_file)
    train_set_x = Data[:1500]
    val_set_x = Data[1501:1750]
    test_set_x = Data[1751:1900]
    train_set_y = y[:1500]
    val_set_y = y[1501:1750]
    test_set_y = y[1751:1900]
    # Divided dataset into 3 parts. I had 2000 images.

    train_set = train_set_x, train_set_y
    val_set = val_set_x, val_set_y
    test_set = test_set_x, val_set_y

    dataset = [train_set, val_set, test_set]

    f = gzip.open('file.pkl.gz','wb')
    cPickle.dump(dataset, f, protocol=2)
    f.close()    


if __name__=='__main__':
    generate_pkl_file("trainLabels.csv") 

Error Message: Traceback (most recent call last):

  File "convert_dataset_pkl_file.py", line 50, in <module>
    generate_pkl_file("trainLabels.csv") 
  File "convert_dataset_pkl_file.py", line 29, in generate_pkl_file
    Data, y =load_dir_data(csv_file)
  File "convert_dataset_pkl_file.py", line 24, in load_dir_data
    ret_val=np.array(dataset,dtype=theano.config.floatX)
ValueError: setting an array element with a sequence.

csv file contains two fields.. image name, classification label when is run this in python interpreter, it seems to be working for me.. as follows.. I dont get error saying setting an array element with a sequence here..

---------python interpreter output----------

image=Image.open('sample.jpg').convert('LA')
pixels=[f[0] for f in list(image.getdata())]
dataset=[]
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
b=numpy.array(dataset,dtype=theano.config.floatX)
b
array([[ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.]])

Even though i am running same set of instruction (logically), when i run sample.py, i get valueError: setting an array element with a sequence.. I trying to understand this behavior.. any help would be great..

6
  • 1
    Please always include the full error traceback in your question. Commented May 29, 2015 at 5:36
  • Don't just tell us the error. Show us where it occurred. Commented May 29, 2015 at 5:48
  • made edits..I tried with gdb. But there was no stack Commented May 29, 2015 at 5:49
  • What does dataset look like - in both cases. We don't need all the values, but enough to see if there is a difference. Commented May 29, 2015 at 6:59
  • image,level sample,2 10_left,0 10_right,0 13_left,0 13_right,0 15_left,1 15_right,2 16_left,4 16_right,4 It is csv file with only two entries per line.. If a load images individually in the interpreter and append pixels, i can perform np.array(dataset,dtype=theano.config.floatX).. But no when i run it in file.. Commented May 29, 2015 at 7:02

1 Answer 1

6

The problem is probably similar to that of this question.

You're trying to create a matrix of pixel values with a row per image. But each image has a different size so the number of pixels in each row is different.

You can't create a "jagged" float typed array in numpy -- every row must be of the same length.

You'll need to pad each row to the length of the largest image.

4
  • P.S. if you had just searched for 'numpy "setting an array element with a sequence"' the very first result (I see) in Google is the StackOverflow question I linked to. Commented May 29, 2015 at 8:22
  • No.. All images are of same size.. Its working for me in python interpreter.. Doesnt when i run it as file.. Commented May 29, 2015 at 8:53
  • Your Python interpreter example shows the same image being added many times, not many different images. Commented May 29, 2015 at 8:57
  • sorry my bad.. I have two different image sizes.. I will change and will check it.. Thanks Commented May 29, 2015 at 9:03

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.