I'm using ctypes bit fields to dissect tightly packed binary data. I stuff a record's worth of data into a union as a string, then pull out key fields as integers.
This works great when there are no nulls in the buffer, but any embedded nulls cause cytpes to truncate the string.
Example:
from ctypes import *
class H(BigEndianStructure):
_fields_ = [ ('f1', c_int, 8),
('f2', c_int, 8),
('f3', c_int, 8),
('f4', c_int, 2)
# ...
]
class U(Union):
_fields_ = [ ('fld', H),
('buf', c_char * 6)
]
# With no nulls, works as expected...
u1 = U()
u1.buf='abcabc'
print '{} {} {} (expect: 97 98 99)'.format(u1.fld.f1, u1.fld.f2, u1.fld.f3)
# Embedded null breaks it... This prints '97 0 0', NOT '97 0 99'
u2 = U()
u2.buf='a\x00cabc'
print '{} {} {} (expect: 97 0 99)'.format(u2.fld.f1, u2.fld.f2, u2.fld.f3)
Browsing the ctypes source, I see two methods to set a char array, CharArray_set_value() and CharArray_set_raw(). It appears that CharArray_set_raw() will handle nulls properly whereas CharArray_set_value() will not.
But I can't figure out how to invoke the raw version... It looks like a property, so I'd expect something like:
ui.buf.raw = 'abcabc'
but that yields:
AttributeError: 'str' object has no attribute raw
Any guidance appreciated. (Including a completely different approach!)
(Note: I need to process thousands of records per second, so efficiency is critical. Using an array comprehension to stuff a byte array in the structure works, but it's 100x slower.)