Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Is my algorithm slow because it has problems? Or, is any bitwise common prefix solution not going to give me the performance I'm looking for?

After profiling my algorithm, I found that over 60% of the time was spent on one line len(os.path.commonprefix((item1, item2))). I'm only looking for the length of the prefix

To solve this I tried to write a bitwise prefix solution

def bit_prefix(a, b):
    min_len = min(len(a), len(b))
    if min_len > 0:
        x = str(bin(
            int(a[::-1].encode('hex'), 16) ^ int(b[::-1].encode('hex'), 16)))
        y = x.strip('0')
        if len(y) is 1:
            return min_len
        else:
            return (len(x) - len(y)) / 8
    else:
        return 0

I've only gotten a marginal improvement in speed with long prefixes

a = 'a' * 1000000 + 'z'

b = 'a' * 900000 + 'z'

timeit.timeit(lambda: bit_prefix(a, b), number=100)
Out[34]: 6.340372534867129

timeit.timeit(lambda: len(os.path.commonprefix((a, b))), number=100)
Out[35]: 7.5483549056534684

print bit_prefix(a, b), len(os.path.commonprefix((a, b)))
900000 900000

And my algorithm performs more poorly with short prefixes

a = 'aaz'

b = 'az'

timeit.timeit(lambda: bit_prefix(a, b), number=1000000)
Out[42]: 3.968956086175467

timeit.timeit(lambda: len(os.path.commonprefix((a, b))), number=1000000)
Out[43]: 1.1592788235707303

print bit_prefix(a, b), len(os.path.commonprefix((a, b)))
1 1

If my algorithm isn't broken and a bitwise solution won't give me the performance boost I'm looking for, can you refer me to a common prefix solution that would outperform os.path.commonprefix?

Here's the bit_pefix profile

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    29                                           @profile
    30                                           def bit_prefix(a, b):
    31     36140        81099      2.2      4.2      min_len = min(len(a), len(b))
    32     36140        49386      1.4      2.5      if min_len > 0:
    33     36140        49739      1.4      2.6          x = str(
    34     36140        47846      1.3      2.5              bin(
    35     36140        47232      1.3      2.4              int(
    36     36140        89499      2.5      4.6              a[::-1]
    37     36140       242994      6.7     12.5              .encode('hex'),
    38     36140       164442      4.6      8.4              16)
    39                                                       ^ 
    40     36140        49601      1.4      2.5              int(
    41     36140        88745      2.5      4.6              b[::-1]
    42     36140       216488      6.0     11.1              .encode('hex'),
    43     36140       504425     14.0     25.9              16)))
    44     36140       187571      5.2      9.6          y = x.strip('0')
    45     36140        61027      1.7      3.1          if len(y) is 1:
    46                                                       return min_len
    47                                                   else:
    48     36140        67507      1.9      3.5              return (len(x) - len(y)) / 8
    49                                               else:
    50                                                   return 0

Update

I've created another algorithm that seems to be faster than others. Here's the Code Review for that.

share|improve this question
1  
I don't think you'll get anything that is much faster. I thought that using enumerate() with itertools.izip() would be blisteringly fast since they return generators, so a minimum of characters are iterated, but even that came out with just barely better or just barely worse: about the same. – zondo Mar 28 at 12:14
    
@zondo Thanks for izip suggestion, it was better than everything else I found online! I tested it in another Code Review post. – mattkaeo Mar 29 at 5:16

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.