I have a data structure that looks like this:

data = [ ('a', 1.0, 2.0),
         ('b', 2.0, 4.0),
         ('c', 3.0, 6.0) ]

I want to convert it into a structured array using numpy. However, when I try the following, I keep the floats but I lose the string information:

import numpy
x = numpy.array(data, dtype=[('label', str), ('x', float), ('y', float)])
print x

Resulting in:

>>> [('', 1.0, 2.0) ('', 2.0, 4.0) ('', 3.0, 6.0)]

Could anyone explain why this happens, and how I might keep the string information?

share|improve this question
x = numpy.array(data, dtype=[('label', (str,1)), ('x', float), ('y', float)]) – luke14free Oct 13 '12 at 14:29

1 Answer

up vote 3 down vote accepted

You can see the problem if you print out the array and look carefully:

>>> numpy.array(data, dtype=[('label', str), ('x', float), ('y', float)])
array([('', 1.0, 2.0), ('', 2.0, 4.0), ('', 3.0, 6.0)], 
      dtype=[('label', '|S0'), ('x', '<f8'), ('y', '<f8')])

The first field has a data type of '|S0' -- a zero width string field. Make the string field longer -- here's a 2-char string field:

>>> numpy.array(data, dtype=[('label', 'S2'), ('x', float), ('y', float)])
array([('a', 1.0, 2.0), ('b', 2.0, 4.0), ('c', 3.0, 6.0)], 
      dtype=[('label', '|S2'), ('x', '<f8'), ('y', '<f8')])

You can also do it this way, as documented here:

>>> numpy.array(data, dtype=[('label', (str, 2)), ('x', float), ('y', float)])
array([('a', 1.0, 2.0), ('b', 2.0, 4.0), ('c', 3.0, 6.0)], 
      dtype=[('label', '|S2'), ('x', '<f8'), ('y', '<f8')])
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.