Join the Stack Overflow Community
Stack Overflow is a community of 6.5 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I'm using ctypes bit fields to dissect tightly packed binary data. I stuff a record's worth of data into a union as a string, then pull out key fields as integers.

This works great when there are no nulls in the buffer, but any embedded nulls cause cytpes to truncate the string.

Example:

from ctypes import *

class H(BigEndianStructure):
    _fields_ = [ ('f1', c_int, 8),
                 ('f2', c_int, 8),
                 ('f3', c_int, 8),
                 ('f4', c_int, 2)
                 # ...
                 ]

class U(Union):
    _fields_ = [ ('fld', H),
                 ('buf', c_char * 6)
                 ]

# With no nulls, works as expected...
u1 = U()
u1.buf='abcabc'
print '{} {} {} (expect: 97 98 99)'.format(u1.fld.f1, u1.fld.f2, u1.fld.f3)

# Embedded null breaks it...  This prints '97 0 0', NOT '97 0 99'
u2 = U()
u2.buf='a\x00cabc'
print '{} {} {} (expect: 97 0 99)'.format(u2.fld.f1, u2.fld.f2, u2.fld.f3)

Browsing the ctypes source, I see two methods to set a char array, CharArray_set_value() and CharArray_set_raw(). It appears that CharArray_set_raw() will handle nulls properly whereas CharArray_set_value() will not.

But I can't figure out how to invoke the raw version... It looks like a property, so I'd expect something like:

ui.buf.raw = 'abcabc'

but that yields:

AttributeError: 'str' object has no attribute raw

Any guidance appreciated. (Including a completely different approach!)

(Note: I need to process thousands of records per second, so efficiency is critical. Using an array comprehension to stuff a byte array in the structure works, but it's 100x slower.)

share|improve this question

c_char*6 is handled, unfortunately, as a nul-terminated string. Switch to c_byte*6 instead, but lose the convenience of initializing with strings:

from ctypes import *

class H(BigEndianStructure):
    _fields_ = [ ('f1', c_int, 8),
                 ('f2', c_int, 8),
                 ('f3', c_int, 8),
                 ('f4', c_int, 2)
                 # ...
                 ]

class U(Union):
    _fields_ = [ ('fld', H),
                 ('buf', c_byte * 6)
                 ]

u1 = U()
u1.buf=(c_byte*6)(97,98,99,97,98,99)
print '{} {} {} (expect: 97 98 99)'.format(u1.fld.f1, u1.fld.f2, u1.fld.f3)

u2 = U()
u2.buf=(c_byte*6)(97,0,99,97,98,99)
print '{} {} {} (expect: 97 0 99)'.format(u2.fld.f1, u2.fld.f2, u2.fld.f3)

Output:

97 98 99 (expect: 97 98 99)
97 0 99 (expect: 97 0 99)
share|improve this answer
    
Thanks, Mark. This works, but the CPU overhead associated with marshaling the bytes from the string into the byte array makes the approach too slow for my application. (My trials were about 100x slower than ctype's memcpy()). – Simian Oct 13 '14 at 17:20

You can also create the raw-string array outside of your struct/union:

mystring = (c_char * 6).from_buffer(u2)
print mystring.raw

This way you don't have any overhead for conversion. I wonder why a (c_char * 6) behaves differently when used alone vs. used in a Structure/Union...

share|improve this answer
    
For convenience (usually), the CField descriptors for c_char and c_wchar arrays are special-cased in PyCField_FromDesc (in Modules/_ctypes/cfield.c) to convert to and from native Python strings using s_get / s_set and U_get / U_set. – eryksun Dec 11 '15 at 14:17
1  
Don't use from_address for this since the resulting array doesn't own a reference on the source buffer, u2. This is a recipe for segfault disaster. Use (c_char * 6).from_buffer(u2). – eryksun Dec 11 '15 at 14:23
    
In cases such as this I prefer to make the field name private (e.g. _buf) and use a public property. – eryksun Dec 11 '15 at 14:24

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.