Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up

Here's how it works:

Anybody can ask a question
Anybody can answer
The best answers are voted up and rise to the top

A bitwise common prefix algorithm in Python

up vote 1 down vote favorite

Is my algorithm slow because it has problems? Or, is any bitwise common prefix solution not going to give me the performance I'm looking for?

After profiling my algorithm, I found that over 60% of the time was spent on one line len(os.path.commonprefix((item1, item2))). I'm only looking for the length of the prefix

To solve this I tried to write a bitwise prefix solution

def bit_prefix(a, b):
    min_len = min(len(a), len(b))
    if min_len > 0:
        x = str(bin(
            int(a[::-1].encode('hex'), 16) ^ int(b[::-1].encode('hex'), 16)))
        y = x.strip('0')
        if len(y) is 1:
            return min_len
        else:
            return (len(x) - len(y)) / 8
    else:
        return 0

I've only gotten a marginal improvement in speed with long prefixes

a = 'a' * 1000000 + 'z'

b = 'a' * 900000 + 'z'

timeit.timeit(lambda: bit_prefix(a, b), number=100)
Out[34]: 6.340372534867129

timeit.timeit(lambda: len(os.path.commonprefix((a, b))), number=100)
Out[35]: 7.5483549056534684

print bit_prefix(a, b), len(os.path.commonprefix((a, b)))
900000 900000

And my algorithm performs more poorly with short prefixes

a = 'aaz'

b = 'az'

timeit.timeit(lambda: bit_prefix(a, b), number=1000000)
Out[42]: 3.968956086175467

timeit.timeit(lambda: len(os.path.commonprefix((a, b))), number=1000000)
Out[43]: 1.1592788235707303

print bit_prefix(a, b), len(os.path.commonprefix((a, b)))
1 1

If my algorithm isn't broken and a bitwise solution won't give me the performance boost I'm looking for, can you refer me to a common prefix solution that would outperform os.path.commonprefix?

Here's the bit_pefix profile

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    29                                           @profile
    30                                           def bit_prefix(a, b):
    31     36140        81099      2.2      4.2      min_len = min(len(a), len(b))
    32     36140        49386      1.4      2.5      if min_len > 0:
    33     36140        49739      1.4      2.6          x = str(
    34     36140        47846      1.3      2.5              bin(
    35     36140        47232      1.3      2.4              int(
    36     36140        89499      2.5      4.6              a[::-1]
    37     36140       242994      6.7     12.5              .encode('hex'),
    38     36140       164442      4.6      8.4              16)
    39                                                       ^ 
    40     36140        49601      1.4      2.5              int(
    41     36140        88745      2.5      4.6              b[::-1]
    42     36140       216488      6.0     11.1              .encode('hex'),
    43     36140       504425     14.0     25.9              16)))
    44     36140       187571      5.2      9.6          y = x.strip('0')
    45     36140        61027      1.7      3.1          if len(y) is 1:
    46                                                       return min_len
    47                                                   else:
    48     36140        67507      1.9      3.5              return (len(x) - len(y)) / 8
    49                                               else:
    50                                                   return 0

Update

I've created another algorithm that seems to be faster than others. Here's the Code Review for that.

edited Mar 29 '16 at 5:12

asked Mar 28 '16 at 10:47

mattkaeo

28719

I don't think you'll get anything that is much faster. I thought that using enumerate() with itertools.izip() would be blisteringly fast since they return generators, so a minimum of characters are iterated, but even that came out with just barely better or just barely worse: about the same. – zondo Mar 28 '16 at 12:14

@zondo Thanks for izip suggestion, it was better than everything else I found online! I tested it in another Code Review post. – mattkaeo Mar 29 '16 at 5:16

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged python performance strings python-2.7 bitwise or ask your own question.

question feed

asked	12 months ago
viewed	61 times

current community

your communities

more stack exchange communities

A bitwise common prefix algorithm in Python

Update

Your Answer

Browse other questions tagged python performance strings python-2.7 bitwise or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

A bitwise common prefix algorithm in Python

Update

Can you help? Code Review Stack Exchange depends on everyone sharing their knowledge. If you're able to answer this question, please do!

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged python performance strings python-2.7 bitwise or ask your own question.

Linked

Related

Hot Network Questions