Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm trying to read a binary file (which represents a matrix in Matlab) in Python. But I am having trouble reading the file and converting the bytes to the correct values.

The binary file consists of a sequence of 4-byte numbers. The first two numbers are the number of rows and columns respectively. My friend gave me a Matlab function he wrote that does this using fwrite. I would like to do something like this:

f = open(filename, 'rb')
rows = f.read(4)
cols = f.read(4)
m = [[0 for c in cols] for r in rows]
r = c = 0
while True:
    if c == cols:
        r += 1
        c = 0
    num = f.read(4)
    if num:
        m[r][c] = num
        c += 1
    else:
        break

But whenever I use f.read(4), I get something like '\x00\x00\x00\x04' (this specific example should represent a 4), and I can't figure out convert it into the correct number (using int, hex or anything like that doesn't work). I stumbled upon struct.unpack, but that didn't seem to help very much.

Here is an example matrix and the corresponding binary file (as it appears when I read the entire file using the python function f.read() without any size paramater) that the Matlab function created for it:

4     4     2     4
2     2     2     1
3     3     2     4
2     2     6     2

'\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00'

So the first 4 bytes and the 5th-8th bytes should both be 4, as the matrix is 4x4. and then it should be 4,4,2,4,2,2,2,1,etc...

Thanks guys!

share|improve this question
    
The struct module is your friend. It might take you a little bit to get used to, but it is a very powerful tool. –  Nick Bastin Jul 1 '10 at 22:46

2 Answers 2

up vote 2 down vote accepted

I looked a bit more in your problem, since I had never used struct before so it was good learning activity. Turns out there are couple of twists there - first the numbers are not stored as 4-byte integers but as 4-byte float in big-endian form. Second, if your example is correct, then the matrix was not stored as one would expect - by rows, but by columns instead. E.g. it was output like so (pseudocode):

for j in cols:
  for i in rows:
    write Aij to file

So I had to transpose the result after reading. Here is the code that you need given the example:

import struct 

def readMatrix(f):
    rows, cols = struct.unpack('>ii',f.read(8))
    m = [ list(struct.unpack('>%df' % rows, f.read(4*rows)))
             for c in range(cols)
        ]
    # transpose result to return
    return zip(*m)

And here we test it:

>>> from StringIO import StringIO
>>> f = StringIO('\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00')
>>> mat = readMatrix(f)
>>> for row in mat:
...     print row
...     
(4.0, 4.0, 2.0, 4.0)
(2.0, 2.0, 2.0, 1.0)
(3.0, 3.0, 2.0, 4.0)
(2.0, 2.0, 6.0, 2.0)
share|improve this answer
    
Your answer was better, my apologies. However, I don't know if it was just my machine, but I had to use "!" instead of ">" for struct.unpack –  Daniel Waltrip Jul 19 '10 at 19:19
    
@Daniel: hm, that's weird if '!' and '>' give you different result, seems to me they should be the same. The documentation says The form "!" [network order = big-endian] is available for those poor souls who claim they can't remember whether network byte order is big-endian [">"] or little-endian ["<"]. But if it works, don't touch it - it ain't broken :) –  Nas Banov Jul 20 '10 at 7:31
rows = f.read(4)
cols = f.read(4)

both names are now bound to 4-byte strings. To turn them into integers instead,

import struct

rowsandcols = f.read(8)
rows, cols = struct.unpack('=ii', rowsandcols)

See the docs for struct.unpack.

share|improve this answer
    
It didn't work for me =/ >>> import struct >>> f = open('Z:\summer reu 2010\m.dat','rb') >>> rowsandcols = f.read(8) >>> rows, cols = struct.unpack('=ii',rowsandcols) >>> rows 67108864 >>> cols 67108864 rows and cols should both be 4 –  Daniel Waltrip Jul 1 '10 at 23:08
    
gahh i can't format my comment. Here is a screenshot: i47.tinypic.com/14ub18n.jpg –  Daniel Waltrip Jul 1 '10 at 23:12
2  
Considering the data is described as being big-endian and that most popular CPUs today are little-endian, perhaps it should be ! or > instead of = ? –  Nas Banov Jul 1 '10 at 23:12
    
yes that worked Nas. Can someone please explain what all these different formats actually mean? What is big-endian/small-endian and native/standard? –  Daniel Waltrip Jul 1 '10 at 23:14
    
I looked "endianness" up on wikipedia, sorry to bother you all. Thank you very much for the help! =) –  Daniel Waltrip Jul 1 '10 at 23:18

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.