Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm trying to work with 2d arrays that can be accessed by column names using python. The data come from a database and it may have different types and null values. NoneType is not allowed in the tuples so I tried to replace them by np.nan.

This piece of code works if there are no null values in the database. However, my final goal is to have a masked array, but I cannot even create an array.

import MySQLdb
import numpy

connection = MySQLdb.connect(host=server, user=user, passwd=password, db=db)
cursor = connection.cursor()
cursor.execute(query)
results = list(cursor.fetchall())

dt = [('cig', int), ('u_CIG', 'S10'), ('e_ICO', float), ('VCO', int)]

for index_r, row in enumerate(results):
    newrow = list(row)
    for index_c, col in enumerate(newrow):
        if col is None:
            newrow[index_c] = numpy.nan
    results[index_r] = tuple(newrow)
 x = numpy.array(results, dtype=dt)

The resulting error is:

x = numpy.array(results, dtype=dtypes)
ValueError: cannot convert float NaN to integer

After performing fetchall, results contain something like:

[(10L,
'*',
Decimal('3.47'),
180L),
(27L,
' ',
Decimal('7.21'),
None)]

Any idea of how can I solve this problem? Thank you!

share|improve this question

2 Answers 2

up vote 0 down vote accepted

Working on Larsmans example, I think what you want would be:

    import numpy as np
    import numpy.ma as ma

    values = [('<', 2, 3.5, 'as', 6), (None, None, 6.888893, 'bb', 9),
              ('a', 66, 77, 'sdfasdf', 45)]
    nrows = len(values)

    arr = ma.zeros(nrows, dtype=[('c1', 'S1'),('c2', np.int), ('c3', np.float), 
                                 ('c4', 'S8'), ('c5', np.int)])

    for i, row in enumerate(values):
        for j, cell in enumerate(values[i]):
            if values[i][j] is None:
                arr.mask[i][j] = True
            else:
                arr.data[i][j] = cell

    print arr
share|improve this answer

There is no integer representation of NaN. You can either switch to floating point, or construct the mask while filling the array:

>>> values = [1, 2, None, 4]
>>> arr = np.empty(len(values), dtype=np.int64)
>>> mask = np.zeros(len(values), dtype=np.bool)
>>> for i, v in enumerate(values):
...     if v is None:
...         mask[i] = True
...     else:
...         arr[i] = v
...         
>>> np.ma.array(arr, mask=mask)
masked_array(data = [1 2 -- 4],
             mask = [False False  True False],
       fill_value = 999999)
share|improve this answer
    
But the problem comes using tuples. If you define a list of tuples it throws exceptions: e.g. values = [(1, 2), (None, 4)] arr = np.empty(len(values), dtype=[('c1', np.int64),('c2', np.int64)]) –  tetrarquis Oct 3 '13 at 12:10
    
@tetrarquis: then use np.empty((len(values), len(values[0])). What I posted is just an example, you'll have to adapt it to your use case. –  larsmans Oct 3 '13 at 12:59

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.