Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I've written a small suite to easily convert my data between binary, hexadecimal and ASCII:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import sys


def to_list(chain, offset):
    return [chain[i:i+offset] for i in range(0, len(chain), offset)]

def bin2str(chain):
    return ''.join((chr(int(chain[i:i+8], 2)) for i in range(0, len(chain), 8)))

def bin2hex(chain):
    return ''.join((hex(int(chain[i:i+8], 2))[2:] for i in range(0, len(chain), 8)))

def str2bin(chain):
    return ''.join((bin(ord(c))[2:].zfill(8) for c in chain))

def str2hex(chain):
    return ''.join((hex(ord(c))[2:] for c in chain))

def hex2bin(chain):
    return ''.join((bin(int(chain[i:i+2], 16))[2:].zfill(8) for i in range(0, len(chain), 2)))

def hex2str(chain):
    return ''.join((chr(int(chain[i:i+2], 16)) for i in range(0, len(chain), 2)))


if __name__ == '__main__':
    word = sys.argv[1]
    word_bin = str2bin(word)
    word_hex = str2hex(word)

    # tests
    assert bin2str(word_bin) == word
    assert bin2hex(word_bin) == word_hex
    assert hex2str(word_hex) == word
    assert hex2bin(word_hex) == word_bin

    # output
    print "bin:", to_list(word_bin, 8)
    print "hex:", to_list(word_hex, 2)

Example:

$ ./utils.py test
bin: ['01110100', '01100101', '01110011', '01110100']
hex: ['74', '65', '73', '74']

How would you optimize those methods?

Also, I realize I'm actually only working with str representations of the different formats. What would you do to get real binary or hexadecimal results from those functions?

share|improve this question

migrated from stackoverflow.com Mar 26 at 15:56

This question came from our site for professional and enthusiast programmers.

1  
The int 011 (oct) will be 9 in decimal, the int 0x11 (hex) will be 17 in decimal and the int 0b11 (bin) will be 3 in decimal. Just use them as you would normal ints. –  jakekimds Mar 25 at 2:04
2  
You use a lot of list comprehensions when you don't need to. In fact, all of them except to_list()'s return value can be changed to generator expressions without changing anything else. –  Blacklight Shining Mar 25 at 2:06
4  
You can't return actual integers in binary format. To python, they are just all numbers. They will be converted to decimal when you print them if you try to use them as actual integers. –  jakekimds Mar 25 at 2:13
2  
Why write all of this when there are built-in functions made for this exact purpose? To convert a number in base 2-36 to base 10, just use the optional base argument of int(). To convert a number from base 10 to binary, octal, or hexadecimal, use bin(), oct(), or hex() respectively. –  pzp Mar 25 at 2:18
    
@jakekimds I see. Then I'll only use those functions when I'm debugging things, otherwise I'll just convert to base 10 and directly apply binary operators to those numbers. That clarifies things, thanks ! –  Fandekasp Mar 25 at 2:36

1 Answer 1

up vote 2 down vote accepted

Step 1: minor cleanup

You've defined a to_list() function, but aren't really taking advantage of it. Instead, you're still writing things like … for i in range(0, len(chain), 8)) in functions like bin2str. You should change to_list() into a generator. (Then, since it is no longer returning a list, you should also rename it. You'll end up with the same thing as the chunks() function in this Stack Overflow answer.)

chain is an odd parameter name. It strikes me as a botched English translation from a Romance language ("chaîne", "catena", etc.). If we want to help programmers who aren't English speakers, I suggest replacing your …2… names with _to_.

Instead of slicing off the 0b and 0x prefixes generated by the bin() and hex() functions, and zero-padding using zfill(), use str.format() to take care of both problems.

# http://stackoverflow.com/a/312464/1157100
def _chunks(str, chunk_size):
    for i in xrange(0, len(str), chunk_size):
        yield str[i:i+chunk_size]

def hex_to_bin(hex):
    return ''.join('{:08b}'.format(int(x, 16)) for x in _chunks(hex, 2))

def str_to_bin(str):
    return ''.join('{:08b}'.format(ord(c)) for c in str)

def bin_to_hex(bin):
    return ''.join('{:02x}'.format(int(b, 2)) for b in _chunks(bin, 8))

def str_to_hex(str):
    return ''.join('{:02x}'.format(ord(c)) for c in str)

def bin_to_str(bin):
    return ''.join(chr(int(b, 2)) for b in _chunks(bin, 8))

def hex_to_str(hex):
    return ''.join(chr(int(x, 16)) for x in _chunks(hex, 2))

I've rearranged the order of appearance, grouping by the target type rather than the source type, as a segue to the next step…

Step 2: decluttering the interface

Looking at the code above, you'll notice a pattern: each function just mixes-and-matches code. There is always a decoding portion (either … for something in _chunks(input, base) or … for c in str) and an encoding portion (either ''.join('{:format}'.format(something) …) or ''.join(chr(int(something, base)) …)).

A more disturbing observation is that the complexity is exponential. If you wanted to add octal support, for example, you would need to add str_to_oct(), oct_to_str(), bin_to_oct(), oct_to_bin(), hex_to_oct(), and oct_to_hex() — doubling the number of functions.

A better design would be to convert everything to/from a common interface. I've chosen an iterable of ASCII values as that hub.

def _chunks(str, chunk_size):
    for i in xrange(0, len(str), chunk_size):
        yield str[i:i+chunk_size]

def from_str(str):
    for c in str:
        yield ord(c)

def to_str(ascii):
    return ''.join(chr(a) for a in ascii)

def from_bin(bin):
    for chunk in _chunks(bin, 8):
        yield int(chunk, 2)

def to_bin(ascii):
    return ''.join('{:08b}'.format(a) for a in ascii)

def from_hex(hex):
    for chunk in _chunks(hex, 2):
        yield int(chunk, 16)

def to_hex(ascii):
    return ''.join('{:02x}'.format(a) for a in ascii)

Then, instead of writing str2bin(word), you would write to_bin(from_str(word)).

You happen to be at the break-even point: either way, there are six functions. The architecture proposed in Step 2 simplifies your library (at a slight burden to the caller). You would see a payoff, though, if you were to add octal support — you would be adding two functions instead of six.

share|improve this answer
    
Small note, you're using reserved keywords for the arguments. Cannot use str, ascii, bin and hex. Moreover, the arguments are sometimes confusing. The "ascii" argument for the function to_str is actually a list of base10 numbers, not ascii :) –  Fandekasp Mar 29 at 0:13
    
@Fandekasp Those aren't reserved keywords; they are variable names that shadow built-in functions. In general, such shadowing is a bad idea, but I know that I'm not going to call hex() on a hex string, or bin() on a binary string. The parameters named ascii actually do contain ASCII codes. (They are integers, not in any particular base until you choose to print them, and the standard stringification routine uses a base-10 representation.) –  200_success Mar 29 at 3:15
    
I see, thanks for the precision –  Fandekasp Mar 29 at 9:17
    
PEP 0008 notes that, by convention, collisions with keywords are avoided by appending an underscore to the name. Personally, I extend this to include builtins' names. –  Blacklight Shining Mar 29 at 14:45

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.