Converting a 2D numpy array to a structured array

Question

I'm trying to convert a two-dimensional array into a structured array with named fields. I want each row in the 2D array to be a new record in the structured array. Unfortunately, nothing I've tried is working the way I expect.

I'm starting with:

>>> myarray = numpy.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
 ['World' '3.6' '2']]

I want to convert to something that looks like this:

>>> newarray = numpy.array([("Hello",2.5,3),("World",3.6,2)], dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[('Hello', 2.5, 3L) ('World', 3.6000000000000001, 2L)]

What I've tried:

>>> newarray = myarray.astype([("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)]
 [('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]

>>> newarray = numpy.array(myarray, dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)]
 [('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]

Both of these approaches attempt to convert each entry in myarray into a record with the given dtype, so the extra zeros are inserted. I can't figure out how to get it to convert each row into a record.

Another attempt:

>>> newarray = myarray.copy()
>>> newarray.dtype = [("Col1","S8"),("Col2","f8"),("Col3","i8")]
>>> print newarray
[[('Hello', 1.7219343871178711e-317, 51L)]
 [('World', 1.7543139673493688e-317, 50L)]]

This time no actual conversion is performed. The existing data in memory is just re-interpreted as the new data type.

The array that I'm starting with is being read in from a text file. The data types are not known ahead of time, so I can't set the dtype at the time of creation. I need a high-performance and elegant solution that will work well for general cases since I will be doing this type of conversion many, many times for a large variety of applications.

Thanks!

Matthew Rankin · Accepted Answer · 2013-02-26 20:17:04Z

You can "create a record array from a (flat) list of arrays" using numpy.core.records.fromarrays as follows:

>>> import numpy as np
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
 ['World' '3.6' '2']]


>>> newrecarray = np.core.records.fromarrays(myarray.transpose(), 
                                             names='col1, col2, col3',
                                             formats = 'S8, f8, i8')

>>> print newrecarray
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]

I was trying to do something similar. I found that when numpy created a structured array from an existing 2D array (using np.core.records.fromarrays), it considered each column (instead of each row) in the 2-D array as a record. So you have to transpose it. This behavior of numpy does not seem very intuitive, but perhaps there is a good reason for it.

gnibbler · Answer 2 · 2010-09-01 23:41:01Z

up vote 1 down vote

>>> import numpy
>>> myarray = numpy.array([("Hello",2.5,3),("World",3.6,2)], dtype=tuple)
>>> print myarray
[[Hello 2.5 3]
 [World 3.6 2]]
>>> myarray.tolist()
[['Hello', 2.5, 3], ['World', 3.6000000000000001, 2]]

answered Sep 1 '10 at 23:41

gnibbler
92.6k4100221

Adding tuple as the dtype in the definition of myarray doesn't seem to have changed anything. Also, I need the output to be a structured array with dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")]). I'm looking for a solution that does not involve converting to a list (for performance reasons). – Emma Sep 2 '10 at 0:06

1

@Emma, adding the dtype of tuple prevents all the items being converted to strings. ie. the numeric entries are still numbers. If that is not what you want, can you please clarify. – gnibbler Sep 2 '10 at 0:20

add comment

Philip Lawrence · Answer 3 · 2013-03-01 13:50:21Z

Okay, I have been struggling with this for a while now but I have found a way to do this that doesn't take too much effort. I apologise if this code is "dirty"....

Let's start with a 2D array:

mydata = numpy.array([['text1', 1, 'longertext1', 0.1111],
                     ['text2', 2, 'longertext2', 0.2222],
                     ['text3', 3, 'longertext3', 0.3333],
                     ['text4', 4, 'longertext4', 0.4444],
                     ['text5', 5, 'longertext5', 0.5555]])

So we end up with a 2D array with 4 columns and 5 rows:

mydata.shape
Out[30]: (5L, 4L)

To use numpy.core.records.arrays - we need to supply the input argument as a list of arrays so:

tuple(mydata)
Out[31]: 
(array(['text1', '1', 'longertext1', '0.1111'], 
      dtype='|S11'),
 array(['text2', '2', 'longertext2', '0.2222'], 
      dtype='|S11'),
 array(['text3', '3', 'longertext3', '0.3333'], 
      dtype='|S11'),
 array(['text4', '4', 'longertext4', '0.4444'], 
      dtype='|S11'),
 array(['text5', '5', 'longertext5', '0.5555'], 
      dtype='|S11'))

This produces a separate array per row of data BUT, we need the input arrays to be by column so what we will need is:

tuple(mydata.transpose())
Out[32]: 
(array(['text1', 'text2', 'text3', 'text4', 'text5'], 
      dtype='|S11'),
 array(['1', '2', '3', '4', '5'], 
      dtype='|S11'),
 array(['longertext1', 'longertext2', 'longertext3', 'longertext4',
       'longertext5'], 
      dtype='|S11'),
 array(['0.1111', '0.2222', '0.3333', '0.4444', '0.5555'], 
      dtype='|S11'))

Finally it needs to be a list of arrays, not a tuple, so we wrap the above in list() as below:

list(tuple(mydata.transpose()))

That is our data input argument sorted.... next is the dtype:

mydtype = numpy.dtype([('My short text Column', 'S5'),
                       ('My integer Column', numpy.int16),
                       ('My long text Column', 'S11'),
                       ('My float Column', numpy.float32)])
mydtype
Out[37]: dtype([('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])

Okay, so now we can pass that to the numpy.core.records.array():

myRecord = numpy.core.records.array(list(tuple(mydata.transpose())), dtype=mydtype)

... and fingers crossed:

myRecord
Out[36]: 
rec.array([('text1', 1, 'longertext1', 0.11110000312328339),
       ('text2', 2, 'longertext2', 0.22220000624656677),
       ('text3', 3, 'longertext3', 0.33329999446868896),
       ('text4', 4, 'longertext4', 0.44440001249313354),
       ('text5', 5, 'longertext5', 0.5554999709129333)], 
      dtype=[('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])

Voila! You can index by column name as in:

myRecord['My float Column']
Out[39]: array([ 0.1111    ,  0.22220001,  0.33329999,  0.44440001,  0.55549997], dtype=float32)

I hope this helps as I wasted so much time with numpy.asarray and mydata.astype etc trying to get this to work before finally working out this method.

asked	3 years ago
viewed	2464 times
active	10 months ago

Explore our sites

Converting a 2D numpy array to a structured array

3 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python numpy or ask your own question.

Linked

Hot Network Questions

Explore our sites

Converting a 2D numpy array to a structured array

3 Answers

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged python numpy or ask your own question.

Linked

Related

Hot Network Questions