How to create a numpy 2d array from database with null values

Question

I'm trying to work with 2d arrays that can be accessed by column names using python. The data come from a database and it may have different types and null values. NoneType is not allowed in the tuples so I tried to replace them by np.nan.

This piece of code works if there are no null values in the database. However, my final goal is to have a masked array, but I cannot even create an array.

import MySQLdb
import numpy

connection = MySQLdb.connect(host=server, user=user, passwd=password, db=db)
cursor = connection.cursor()
cursor.execute(query)
results = list(cursor.fetchall())

dt = [('cig', int), ('u_CIG', 'S10'), ('e_ICO', float), ('VCO', int)]

for index_r, row in enumerate(results):
    newrow = list(row)
    for index_c, col in enumerate(newrow):
        if col is None:
            newrow[index_c] = numpy.nan
    results[index_r] = tuple(newrow)
 x = numpy.array(results, dtype=dt)

The resulting error is:

x = numpy.array(results, dtype=dtypes)
ValueError: cannot convert float NaN to integer

After performing fetchall, results contain something like:

[(10L,
'*',
Decimal('3.47'),
180L),
(27L,
' ',
Decimal('7.21'),
None)]

Any idea of how can I solve this problem? Thank you!

Jblasco · Accepted Answer · 2013-10-04 08:56:31Z

Working on Larsmans example, I think what you want would be:

    import numpy as np
    import numpy.ma as ma

    values = [('<', 2, 3.5, 'as', 6), (None, None, 6.888893, 'bb', 9),
              ('a', 66, 77, 'sdfasdf', 45)]
    nrows = len(values)

    arr = ma.zeros(nrows, dtype=[('c1', 'S1'),('c2', np.int), ('c3', np.float), 
                                 ('c4', 'S8'), ('c5', np.int)])

    for i, row in enumerate(values):
        for j, cell in enumerate(values[i]):
            if values[i][j] is None:
                arr.mask[i][j] = True
            else:
                arr.data[i][j] = cell

    print arr

larsmans · Answer 2 · 2013-10-03 11:45:37Z

up vote 2 down vote

There is no integer representation of NaN. You can either switch to floating point, or construct the mask while filling the array:

>>> values = [1, 2, None, 4]
>>> arr = np.empty(len(values), dtype=np.int64)
>>> mask = np.zeros(len(values), dtype=np.bool)
>>> for i, v in enumerate(values):
...     if v is None:
...         mask[i] = True
...     else:
...         arr[i] = v
...         
>>> np.ma.array(arr, mask=mask)
masked_array(data = [1 2 -- 4],
             mask = [False False  True False],
       fill_value = 999999)

answered Oct 3 '13 at 11:45

larsmans
173k17262430

But the problem comes using tuples. If you define a list of tuples it throws exceptions: e.g. values = [(1, 2), (None, 4)] arr = np.empty(len(values), dtype=[('c1', np.int64),('c2', np.int64)]) – tetrarquis Oct 3 '13 at 12:10

@tetrarquis: then use np.empty((len(values), len(values[0])). What I posted is just an example, you'll have to adapt it to your use case. – larsmans Oct 3 '13 at 12:59

add a comment |

asked	12 months ago
viewed	369 times
active	12 months ago

current community

your communities

more stack exchange communities

How to create a numpy 2d array from database with null values

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays numpy nonetype or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

How to create a numpy 2d array from database with null values

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays numpy nonetype or ask your own question.

Related

Hot Network Questions