Efficient algorithm to find the largest run of zeros in a binary string?

Question

I am looking for an efficient algorithm to find the longest run of zeros in a binary string. My implementation is in Python 2.7, but all I require is the idea of the algorithm.

For example, given '0010011010000', the function should return 4.

Thilo · Answer 1 · 2015-06-09 03:59:15Z

up vote 8 down vote

I don't think there is anything better than a single pass over the string, counting the current sequence length (and updating the maximum) as you go along.

If by "binary string" you mean raw bits, you can read them one byte at a time and extract the eight bits in there (using bit shifting or masking). That does not change the overall algorithm or its complexity.

answered 2 hours ago

Thilo
129k43271415

Okay, that's what I feared. I was hoping there was a more efficient implementation. – Soke 2 hours ago

there are probably a few shortcuts if you know properties beforehand, ie. if you know total length beforehand and you've already found a string (terminated) which is larger than num digits remaining etc, but I don't think you can do much better – Alex T 2 hours ago

1

@Soke the approach needs O(1) space and O(n) steps. I think it is quite efficient and I doubt you can do it any faster without posing additional requirements on input strings. – Alik 2 hours ago

1

You could parallelize it (if the String is large enough for that to make sense): Partition the string, for each partition return the length of the longest chain, the longest prefix and the longest suffix. – Thilo 2 hours ago

If it is a binary number ( I think this is what you mean when you say "raw bits"). There are bit hacks you can do to improve on O(n) complexity of the average case. Worst case (alternating 0 and 1) is still O(n) though. Pretty sure the OP is asking about a str though – John La Rooy 1 hour ago

| show 3 more comments

n.m. · Answer 2 · 2015-06-09 05:08:27Z

It should be possible to beat the obvious algorithm. The idea is, if you aleready have a run of 0s of length N and you see two 1s at no more than N positions apart, you don't need to check any positions in between. So check a candidate sequence of zeroes from the end rather than from the beginning. In the very worst case you will just check all the elements, just as in the naïve method, but on the average it will be less than that.

So the algo goes like this (pseudocode, untested)

  maxrun = 0
  curpos = 0
  runstart = 0
  runend = 0

  while curpos + maxrun < array.length
      broken = false
      for i = curpos + maxrun, i >= curpos and not broken, --i
        if array[i] == 1
          broken = true
          curpos = i + 1

      if not broken
        runstart = curpos
        # found a longer run of 0s
        # now extend it to the end
        maxrun++
        curpos += maxrun
        while curpos < array.length and array[curpos] == 0
          maxrun++
        # ok found the 1 at the right end of the run
        # go to the next position and start over
        runend = curpos
        curpos++

 # the longest run of 0s is [runstart, runend)

Bhoot · Answer 3 · 2015-06-09 04:04:17Z

up vote 1 down vote

To find the largest run of consecutive zeros in a binary string, I suggest the following outline:

int maxConsecutiveZeros(String binaryString) {
    int maxcount = Integer.MIN_VALUE;
    int currcount = 0;
    for(int i=0; i < binaryString.length(); i++) {
        if(binaryString.charAt(i) == '0') {
            currcount++;
        } else {
            maxcount = Math.max(currcount, maxcount);
            currcount = 0;
        }
    }
    return maxcount;
}

You should separately handle the case where binaryString ends in zero. Add that part to the provided outline and you are done.

The complexity of this approach is linear in the length of binary string.

answered 2 hours ago

Bhoot
858317

He asked for Python... this isn't even pseudo code... – Travis Griggs 1 hour ago

The part in bold can be done in 2 seconds with a cut and paste. Why not just do it in your "pseudo code" – John La Rooy 1 hour ago

add a comment |

paxdiablo · Answer 4 · 2015-06-09 04:52:04Z

It depends on what you mean by efficient.

If your intent is to minimise runtime, you'll basically have to go over the string character by character, analysing the runs of consecutive zeros and keeping track of the longest, something like:

def longRunZeros(s):
    big = 0
    curr = 0
    for c in s:
        if c == '0':
            curr += 1
        else:
            if curr > big:
                big = curr
            curr = 0
    if curr > big:
        big = curr
    return big

print longRunZeros('0010011010000')

If you're talking about programmer efficiency, just do:

def longRunZeros(s):
    return max(len(i) for i in s.split('1'))

instead.

It won't necessarily run the fastest but it'll free you up to have more time, perhaps to be used in analysing whether you even need raw speed for this operation. It'll also almost certainly be less prone to bugs to to its length.

As to whether you need the speed, consider this. With a 25M string, the character-by-character method takes 2.826 seconds CPU time for a million iterations. The split method takes 3.186 seconds for the same workload¹.

So, unless your string are a lot longer than 25M or you need to do it much more than a million times, it's not going to make a lot of difference, and I'd tend to opt for the method that's easier for me as a developer.

Addendum: after espousing how irrelevant differential performance is here, I feel a bit hypocritical mentioning that another method shown by John La Rooy in a comment actually seems to be a bit faster than both of mine.

But, for completeness, I'll suffer the slings and arrows to point that one out as well:

def longRunZeros(s):
    return len(max(s.split('1')))

That appears to average out at about 1.092 which is double the speed of the character-by-character case above.

¹ Those figures are averages of five runs, in my environment, and I make no guarantee they'll hold up anywhere else.

If you've ever been involved in optimisation efforts, you should know that it should be measured in your actual environments, rather than depending on the say-so of some random (but extremely good-looking) person on the internet :-)

len(max(s.split('1'))) should work too (528ns vs 1280ns on this computer for that testcase) — John La Rooy, 1 hour ago
@John, that actually seems quite a bit faster than both methods. Although I still maintain it's probably unnecessary micro-optimisation, I've included your better method for completeness. — paxdiablo, 1 hour ago
I'll argue that I can more easily convince myself that either of the versions using max are correct. If readable coincides with being fast enough, it's a happy day — John La Rooy, 1 hour ago

samgak · Answer 5 · 2015-06-09 05:00:39Z

Okay, as someone mentioned, it the type is String, then I think you cannot escape O( |N| ) which is I/O time. I here just want to say it it is an integer, then you may do a bit faster, for example:

#include<bits/stdc++.h>
using namespace std;
int n;

void binary(int x){
    if(x){
        binary(x>>1);
        if(x&1) putchar('1');
        else putchar('0');
    }

}

int main() {
    scanf("%d", &n);
    while(n){
        binary(n);
        puts("");
        int x = log2(n&-n);
        printf("Zero range: %d\n", x);
        n >>= (x+1);
    }
    return 0;
}

Ignore the printing part, I think it is O(lg N)? (Caution: as this is handling integer, no padding zero is considered, it should not be hard though)

Shashank · Answer 6 · 2015-06-09 05:13:07Z

up vote 0 down vote

A compiled regex might be decently fast, but I haven't really tested it. Nevertheless:

>>> binstr = '0010011010000'
>>> import re
>>> zeros = re.compile(r'0+')
>>> max(len(m) for m in zeros.findall(binstr))
4

answered 1 hour ago

Shashank
7,7562934

add a comment |

asked	today
viewed	77 times
active	today

current community

your communities

more stack exchange communities

Efficient algorithm to find the largest run of zeros in a binary string?

6 Answers 6

Your Answer

Not the answer you're looking for? Browse other questions tagged python algorithm or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Efficient algorithm to find the largest run of zeros in a binary string?

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python algorithm or ask your own question.

Related

Hot Network Questions