Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm looking for a way to test whether or not a given string repeats itself for the entire string or not.

Examples:

[
    '0045662100456621004566210045662100456621',             # '00456621'
    '0072992700729927007299270072992700729927',             # '00729927'
    '001443001443001443001443001443001443001443',           # '001443'
    '037037037037037037037037037037037037037037037',        # '037'
    '047619047619047619047619047619047619047619',           # '047619'
    '002457002457002457002457002457002457002457',           # '002457'
    '001221001221001221001221001221001221001221',           # '001221'
    '001230012300123001230012300123001230012300123',        # '00123'
    '0013947001394700139470013947001394700139470013947',    # '0013947'
    '001001001001001001001001001001001001001001001001001',  # '001'
    '001406469760900140646976090014064697609',              # '0014064697609'
]

are strings which repeat themselves, and

[
    '004608294930875576036866359447',
    '00469483568075117370892018779342723',
    '004739336492890995260663507109',
    '001508295625942684766214177978883861236802413273',
    '007518796992481203',
    '0071942446043165467625899280575539568345323741',
    '0434782608695652173913',
    '0344827586206896551724137931',
    '002481389578163771712158808933',
    '002932551319648093841642228739',
    '0035587188612099644128113879',
    '003484320557491289198606271777',
    '00115074798619102416570771',
]

are examples of ones that do not.

The repeating sections of the strings I'm given can be quite long, and the strings themselves can be 500 or more characters, so looping through each character trying to build a pattern then checking the pattern vs the rest of the string seems awful slow. Multiply that by potentially hundreds of strings and I can't see any intuitive solution.

I've looked into regexes a bit and they seem good for when you know what you're looking for, or at least the length of the pattern you're looking for. Unfortunately, I know neither.

How can I tell if a string is repeating itself and if it is, what the shortest repeating subsequence is?

share|improve this question

This question has an open bounty worth +150 reputation from Zero Piraeus ending in 5 days.

One or more of the answers is exemplary and worthy of an additional bounty.

David Zhang's answer is easily the most elegant, and by far the best-performing. Unfortunately, it hasn't had the attention it deserves, because it was posted some time after my answer and Shashank's started to get lots of upvotes. Hopefully this bounty will help to rectify that.

7  
Question: are 'abcabcab' or 'ababa' considered repeating? –  Shashank Apr 6 at 23:11
4  
looping through each character trying to build a pattern then checking the pattern vs the rest of the string seems awful slow - but is it? –  Tim Castelijns Apr 6 at 23:19
1  
@TimCastelijns Admittedly, I have not actually tested that. –  John Apr 6 at 23:27
1  
possible duplicate of Writing a regex to detect repeat-characters –  Avinash Raj Apr 6 at 23:41
3  
@AvinashRaj The OP is asking about all possible solutions. The question you link to only accepts regex solution. Note that regex may be able to solve the problem but in much more time than necessary. For example an optimal solution (i.e. linear time) would use the suffix tree of the text. You just have to find the longest repeated substring and do some checks on the lengths. –  Bakuriu Apr 7 at 16:28

11 Answers 11

up vote 90 down vote accepted

Here's a simple solution which appears to beat the currently accepted answer in performance:

def principal_period(s):
    i = (s+s).find(s, 1, -1)
    return None if i == -1 else s[:i]

Here are the results of @zero's revision of @davidism's benchmark, with the above function added:

average performance
principal_period: 0.011421355195050506
tigerhawk: 0.014430824859362344
davidism: 0.02022206339939885
zero: 0.029442115473107475
shashank: 0.04027841852348177 with failures
saksham_varma: 0.04634534486664279

This is based on the (somewhat clever) observation that a string repeats itself if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of s in (s+s)[1:-1], and for informing me of the optional start and end arguments of Python's string.find.

If you only need to check if a string is periodic, and don't need to extract the principal period, an even simpler/faster solution is possible:

def is_periodic(s):
    return s in (s+s)[1:-1]
share|improve this answer
5  
You can extend this to find the shortest repeating subsequence by using .find() or .index() instead of in, eg. (s+s).find(s, 1, -1). –  Aleksi Torhamo 2 days ago
6  
Also, I think (s+s).find(s, 1, -1) will be (very slightly) faster than (s+s)[1:-1].find(s), at least for larger strings, since the slicing means you have to create another copy of (nearly) the entire string. –  Aleksi Torhamo 2 days ago
7  
This is brilliant....o_o Nothing more to say. –  Shashank 2 days ago
5  
It's like if you take a sin or cos wave from a periodic function and shift it to the right. Since it's fully periodic, the waves will eventually match up perfectly...The math parallels to this solution are just phenomenal. :) I wish I could give you +∞ upvotes. –  Shashank 2 days ago
3  
Guido's recent update to PEP 8 is relevant here: "Be consistent in return statements. Either all return statements in a function should return an expression, or none of them should. If any return statement returns an expression, any return statements where no value is returned should explicitly state this as return None, and an explicit return statement should be present at the end of the function (if reachable)." –  Zero Piraeus yesterday

Here's a solution using regular expressions.

import re

REPEATER = re.compile(r"(.+?)\1+$")

def repeated(s):
    match = REPEATER.match(s)
    return match.group(1) if match else None

Iterating over the examples in the question:

examples = [
    '0045662100456621004566210045662100456621',
    '0072992700729927007299270072992700729927',
    '001443001443001443001443001443001443001443',
    '037037037037037037037037037037037037037037037',
    '047619047619047619047619047619047619047619',
    '002457002457002457002457002457002457002457',
    '001221001221001221001221001221001221001221',
    '001230012300123001230012300123001230012300123',
    '0013947001394700139470013947001394700139470013947',
    '001001001001001001001001001001001001001001001001001',
    '001406469760900140646976090014064697609',
    '004608294930875576036866359447',
    '00469483568075117370892018779342723',
    '004739336492890995260663507109',
    '001508295625942684766214177978883861236802413273',
    '007518796992481203',
    '0071942446043165467625899280575539568345323741',
    '0434782608695652173913',
    '0344827586206896551724137931',
    '002481389578163771712158808933',
    '002932551319648093841642228739',
    '0035587188612099644128113879',
    '003484320557491289198606271777',
    '00115074798619102416570771',
]

for e in examples:
    sub = repeated(e)
    if sub:
        print("%r: %r" % (e, sub))
    else:
        print("%r does not repeat." % e)

... produces this output:

'0045662100456621004566210045662100456621': '00456621'
'0072992700729927007299270072992700729927': '00729927'
'001443001443001443001443001443001443001443': '001443'
'037037037037037037037037037037037037037037037': '037'
'047619047619047619047619047619047619047619': '047619'
'002457002457002457002457002457002457002457': '002457'
'001221001221001221001221001221001221001221': '001221'
'001230012300123001230012300123001230012300123': '00123'
'0013947001394700139470013947001394700139470013947': '0013947'
'001001001001001001001001001001001001001001001001001': '001'
'001406469760900140646976090014064697609': '0014064697609'
'004608294930875576036866359447' does not repeat.
'00469483568075117370892018779342723' does not repeat.
'004739336492890995260663507109' does not repeat.
'001508295625942684766214177978883861236802413273' does not repeat.
'007518796992481203' does not repeat.
'0071942446043165467625899280575539568345323741' does not repeat.
'0434782608695652173913' does not repeat.
'0344827586206896551724137931' does not repeat.
'002481389578163771712158808933' does not repeat.
'002932551319648093841642228739' does not repeat.
'0035587188612099644128113879' does not repeat.
'003484320557491289198606271777' does not repeat.
'00115074798619102416570771' does not repeat.

The regular expression (.+?)\1+$ is divided into three parts:

  1. (.+?) is a matching group containing at least one (but as few as possible) of any character (because +? is non-greedy).

  2. \1+ checks for at least one repetition of the matching group in the first part.

  3. $ checks for the end of the string, to ensure that there's no extra, non-repeating content after the repeated substrings (and using re.match() ensures that there's no non-repeating text before the repeated substrings).

In Python 3.4 and later, you could drop the $ and use re.fullmatch() instead, or (in any Python at least as far back as 2.3) go the other way and use re.search() with the regex ^(.+?)\1+$, all of which are more down to personal taste than anything else.

share|improve this answer
3  
I have no idea why this concise two liner is not the highest-voted answer. The other answers are not bad, but this one is far better. (It may use the frequently denigrated regular expression, but I can inspect this far easier than the other much longer answers, which are full of nested blocks, potential typos, off-by-one errors, etc.) Ah well, worse is better I suppose. –  Paul Draper Apr 7 at 20:46
5  
I think there are two main reasons for that: 1) some programmers like math more than they like regexes, and 2) since varying the length and nature of the input strings makes different answers pull ahead on performance, the super-long edge case strings (which may not even appear in the real data) make this solution appear suboptimal. –  TigerhawkT3 Apr 7 at 21:26
1  
@PaulDraper: Guess what regex is doing behind the scene? it is parsing the string and stores each element until a posible reapeatition match occurs. Thats the same what OP statet as too slow. so just because it is a 2 liner there isn't any performance win. –  Zaibis yesterday
2  
@Zaibis, I'd normally agree, but this is both the shortest and fastest solution (stackoverflow.com/a/29482936/1212596)....Except for David's, which was posted after I made that comment. I actually like David's approach more (clever!). –  Paul Draper yesterday
3  
@jwg the edge cases in the benchmark concerned (which I helped to develop after @davidism first created it) are artificial, and very unlikely to occur: ten thousand 0s followed by a 1, for example (which is the far outlier on the chart). None of the other edge cases have anything like such poor performances in fact, despite being designed to trip up algorithms like this one. However, David Zhang's answer is clearly superior, which is why I started a bounty to promote it. –  Zero Piraeus 13 hours ago

You can make the observation that for a string to be considered repeating, its length must be divisible by the length of its repeated sequence. Given that, here is a solution that generates divisors of the length from 1 to n / 2 inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result set:

def divisors(n):
    if n > 1: yield 1
    from math import sqrt, floor
    kq = []
    for k in range(2, int(floor(sqrt(n))) + 1):
        q, r = divmod(n, k)
        if r == 0:
            yield k
            kq.append(q)
    while kq:
        yield kq.pop()

def repeats(s):
    l = len(s)
    for d in divisors(l):
        spl = (s[i:i+d] for i in range(0, l, d))
        spl0 = next(spl)
        if all(x == spl0 for x in spl):
            return spl0
    return None

EDIT: In Python 3, the / operator has changed to do float division by default. To get the int division from Python 2, you can use the // operator instead. Thank you to @TigerhawkT3 for bringing this to my attention.

The // operator performs integer division in both Python 2 and Python 3, so I've updated the answer to support both versions. The part where we test to see if all the substrings are equal is now a short-circuiting operation using all and a generator expression.

UPDATE: In response to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and None if it does not. @godlygeek has suggested using divmod to reduce the number of iterations on the divisors generator, and the code has been updated to match that as well. It now returns all positive divisors of n in ascending order, exclusive of n itself.

share|improve this answer
2  
This solution is amazing. You could change the divisors method to follow the produce-consumer pattern. So that it yield's results as they are found. Will be a shame if this isn't the highest answer. Everything else is brute force. –  JustinDanielson Apr 6 at 23:33
3  
@JustinDanielson It returns a generator object created from a generator expression, which is an implicit producer :) It will lazy-evaluate the divisors. –  Shashank Apr 6 at 23:34
1  
Ohh. I didn't know that. Well, even better then. :D I understand why you want to avoid sqrt, but you could use n/2 as the upper bound for the divisor range. –  JustinDanielson Apr 6 at 23:36
1  
@JustinDanielson Thanks for the suggestion, the range upper bound is now (n/2) inclusive. –  Shashank Apr 6 at 23:42
1  
Should n / 2 in divisors() be n // 2? –  TigerhawkT3 Apr 6 at 23:55

Here are some benchmarks for the various answers to this question. There were some surprising results, including wildly different performance depending on the string being tested.

I had to modify some functions to work on Python 3. If you see something wrong, want to add your function, or want to add another test string, let me know.

In summary: there's about a 50x difference between the best- and worst-performing solutions for the large set of example data supplied by OP here (via this comment). David Zhang's solution is the clear winner, outperforming all others by at least 5x for the large example set.

A couple of the answers are very slow in extremely large "no match" cases. Otherwise, the functions seem to be equally matched or clear winners depending on the test.

Here are the results, including plots made (using the modified code here) using matplotlib and seaborn to show the different distributions:

Corpus 1 (supplied examples - small set)

mean performance:
 0.0006  david_zhang
 0.0017  zero
 0.0026  antti
 0.0029  carpetpython
 0.0034  tigerhawk_2
 0.0059  tigerhawk_1
 0.0062  davidism
 0.0068  saksham
 0.0100  riad
 0.0186  shashank

median performance:
 0.0006  david_zhang
 0.0016  zero
 0.0024  antti
 0.0027  carpetpython
 0.0032  tigerhawk_2
 0.0055  tigerhawk_1
 0.0061  davidism
 0.0068  saksham
 0.0103  riad
 0.0179  shashank

Corpus 1 graph Higher resolution image


Corpus 2 (supplied examples - large set)

mean performance:
 0.0011  david_zhang
 0.0072  antti
 0.0078  zero
 0.0079  carpetpython
 0.0133  davidism
 0.0147  tigerhawk_2
 0.0225  shashank
 0.0243  tigerhawk_1
 0.0350  riad
 0.0578  saksham

median performance:
 0.0007  david_zhang
 0.0036  zero
 0.0044  antti
 0.0048  carpetpython
 0.0072  tigerhawk_2
 0.0086  davidism
 0.0121  tigerhawk_1
 0.0151  riad
 0.0197  shashank
 0.0214  saksham

Corpus 2 graph Higher resolution image


Corpus 3 (edge cases)

mean performance:
 0.0750  david_zhang
 0.0779  carpetpython
 0.0955  antti
 0.3300  tigerhawk_2
 0.4258  tigerhawk_1
 0.4506  davidism
 1.3021  saksham
 1.3510  shashank
 7.1807  zero
11.5788  riad

median performance:
 0.0215  antti
 0.0218  carpetpython
 0.0221  tigerhawk_2
 0.0232  david_zhang
 0.0266  tigerhawk_1
 0.0469  saksham
 0.1325  davidism
 0.1961  zero
 2.1620  shashank
 3.6235  riad

Corpus 3 graph Higher resolution image


The tests and raw results are available here.

share|improve this answer
    
I read your other answer and I think you made a great observation that you can split the string into two parts first to avoid computations! :) I'm not sure if my solution returning 'aa' on that one test is 'failing it'. It would only be failing it if the requirement were to return the shortest repeating substring. Either way, nice benchmark, I liked it. –  Shashank Apr 7 at 3:13
2  
@Shashank based on some discussion in chat, the op wanted the shortest repetition if possible. There was no expectation for anyone to know that, but I had to test against something. –  davidism Apr 7 at 3:14
    
I made an optimization to my code which makes it so that it uses a generator instead of a list comprehension for spl. This should avoid quite a bit of computations and in my opinion will make it a bit faster (though probably still not as fast as your solution which has the neat idea to divide the string repeatedly by 2 and cleverly use str.split). –  Shashank Apr 7 at 4:25
    
Mine is faster than yours on 342 on those inputs but @zero's algo makes it really long to re-run the test suite :P –  Antti Haapala Apr 7 at 6:31
1  
@GrijeshChauhan the code which creates the chart is linked in the answer, but here it is again :-) It uses pandas, matplotlib and seaborn. –  Zero Piraeus 12 hours ago

Non-regex solution:

def repeat(string):
    for i in range(1, len(string)//2+1):
        if not len(string)%len(string[0:i]) and string[0:i]*(len(string)//len(string[0:i])) == string:
            return string[0:i]

Faster non-regex solution, thanks to @ThatWeirdo (see comments):

def repeat(string):
    l = len(string)
    for i in range(1, len(string)//2+1):
        if l%i: continue
        s = string[0:i]
        if s*(l//i) == string:
            return s

The above solution is very rarely slower than the original by a few percent, but it's usually a good bit faster - sometimes a whole lot faster. It's still not faster than davidism's for longer strings, and zero's regex solution is superior for short strings. It comes out to the fastest (according to davidism's test on github - see his answer) with strings of about 1000-1500 characters. Regardless, it's reliably second-fastest (or better) in all cases I tested. Thanks, ThatWeirdo.

Test:

print(repeat('009009009'))
print(repeat('254725472547'))
print(repeat('abcdeabcdeabcdeabcde'))
print(repeat('abcdefg'))
print(repeat('09099099909999'))
print(repeat('02589675192'))

Results:

009
2547
abcde
None
None
None
share|improve this answer
    
Isn't this a brute force solution? –  JustinDanielson Apr 6 at 23:23
5  
@JustinDanielson Yes. But a solution none the less. –  Quirliom Apr 6 at 23:37
3  
I'm seeing about 1e-5 to 3e-5 seconds for short strings, 3e-5 to 4e-5 seconds for successful long (1000-character) strings, and a little bit under a millisecond for unsuccessful long strings (worst case). So a thousand 1000-character strings would take about a second. Compared to the math answer, this finds matches 10 times faster, but takes 3 times longer to fail. –  TigerhawkT3 Apr 6 at 23:54
1  
len(string[0:i]) is always equal to i (in this case at least). Replacing these, and also saving len(string) and string[0:i] in variables might speed things up. Also IMO this is a great solution, awesome ;) –  ThatWeirdo Apr 7 at 20:25
1  
@ThatWeirdo: see the edits. Your suggestions make this solution the fastest (based on davidism's gist on github) for strings of around 1000-1500 characters, and second-fastest for shorter or longer strings. :) –  TigerhawkT3 Apr 7 at 21:01

First, halve the string as long as it's a "2 part" duplicate. This reduces the search space if there are an even number of repeats. Then, working forwards to find the smallest repeating string, check if splitting the full string by increasingly larger sub-string results in only empty values. Only sub-strings up to length // 2 need to be tested since anything over that would have no repeats.

def shortest_repeat(orig_value):
    if not orig_value:
        return None

    value = orig_value

    while True:
        len_half = len(value) // 2
        first_half = value[:len_half]

        if first_half != value[len_half:]:
            break

        value = first_half

    len_value = len(value)
    split = value.split

    for i in (i for i in range(1, len_value // 2) if len_value % i == 0):
        if not any(split(value[:i])):
            return value[:i]

    return value if value != orig_value else None

This returns the shortest match or None if there is no match.

share|improve this answer

This version tries only those candidate sequence lengths that are factors of the string length; and uses the * operator to build a full-length string from the candidate sequence:

def get_shortest_repeat(string):
    length = len(string)
    for i in range(1, length // 2 + 1):
        if length % i:  # skip non-factors early
            continue

        candidate = string[:i]
        if string == candidate * (length // i):
            return candidate

    return None

Thanks to TigerhawkT3 for noticing that length // 2 without + 1 would fail to match the abab case.

share|improve this answer
    
This solution is indeed virtually identical to my optimized one. I see that you have a range limit of length//2, just like I did - you have to change that to length//2+1 if you want to catch strings that are exactly doubled (e.g. 'aabaab'). –  TigerhawkT3 2 days ago
    
Ooops, you're correct :D –  Antti Haapala yesterday
    
And now they're identical! \o/ I need to pay more attention to optimization in the future, but I'm glad the algorithm itself was sound. –  TigerhawkT3 yesterday
    
@TigerhawkT3 yeah, but David Zhang beat us both with cleverness :D –  Antti Haapala yesterday

Here's a straight forward solution, without regexes.

For substrings of s starting from zeroth index, of lengths 1 through len(s), check if that substring, substr is the repeated pattern. This check can be performed by concatenating substr with itself ratio times, such that the length of the string thus formed is equal to the length of s. Hence ratio=len(s)/len(substr).

Return when first such substring is found. This would provide the smallest possible substring, if one exists.

def check_repeat(s):
    for i in range(1, len(s)):
        substr = s[:i]
        ratio = len(s)/len(substr)
        if substr * ratio == s:
            print 'Repeating on "%s"' % substr
            return
    print 'Non repeating'

>>> check_repeat('254725472547')
Repeating on "2547"
>>> check_repeat('abcdeabcdeabcdeabcde')
Repeating on "abcde"
share|improve this answer
    
Your solution works well, I added it to the benchmark. It's only noticeably slow with long strings that don't repeat. –  davidism Apr 7 at 6:01
    
@davidism Thanks! Yeah for long non repeating strings, it would have to check for substrings of all lengths, which would make it slow. –  Saksham Varma Apr 7 at 6:32
    
Now that I look at this one carefully, it seems to be nearly identical to my originally posted (before any edits) solution, with the only differences being saving the length and the substring. I guess I had a pretty good algorithm. :P –  TigerhawkT3 4 hours ago
    
@TigerhawkT3 Yeah indeed! :) –  Saksham Varma 4 hours ago

The problem may also be solved in O(n) in worst case with prefix function.

Note, it may be slower in general case(UPD: and is much slower) than other solutions which depend on number of divisors of n, but usually find fails sooner, I think one of bad cases for them will be aaa....aab, where there are n - 1 = 2 * 3 * 5 * 7 ... *p_n - 1 a's

First of all you need to calculate prefix function

def prefix_function(s):
    n = len(s)
    pi = [0] * n
    for i in xrange(1, n):
        j = pi[i - 1]
        while(j > 0 and s[i] != s[j]):
            j = pi[j - 1]
        if (s[i] == s[j]):
            j += 1
        pi[i] = j;
    return pi

then either there's no answer or the shortest period is

k = len(s) - prefix_function(s[-1])

and you just have to check if k != n and n % k == 0 (if k != n and n % k == 0 then answer is s[:k], else there's no answer

You may check the proof here (in Russian, but online translator will probably do the trick)

def riad(s):
    n = len(s)
    pi = [0] * n
    for i in xrange(1, n):
        j = pi[i - 1]
        while(j > 0 and s[i] != s[j]):
            j = pi[j - 1]
        if (s[i] == s[j]):
            j += 1
        pi[i] = j;
    k = n - pi[-1]
    return s[:k] if (n != k and n % k == 0) else None
share|improve this answer
    
Your prefix_function() is not valid Python: you have missing colons on your while and if statements, and && instead of and. After fixing those, it fails with UnboundLocalError: local variable 'i' referenced before assignment because of the line for i in range(i, n):. –  Zero Piraeus 2 days ago
1  
Zero, thanks, fixed these mistakes –  RiaD 2 days ago
    
Thanks :-) If you can put together a function that uses your prefix_function() to return similar results to the other answers – either the shortest substring or None – I'll include it in a revised benchmark I'm putting together. –  Zero Piraeus 2 days ago
    
@ZeroPiraeus, added –  RiaD 2 days ago
    
Sorry, it doesn't work right now, I'll check –  RiaD 2 days ago

I started with more than eight solutions to this problem. Some were bases on regex (match, findall, split), some of string slicing and testing, and some with string methods (find, count, split). Each had benefits in code clarity, code size, speed and memory consumption. I was going to post my answer here when I noticed that execution speed was ranked as important, so I did more testing and improvement to arrive at this:

def repeating(s):
    size = len(s)
    incr = size % 2 + 1
    for n in xrange(1, size//2+1, incr):
        if size % n == 0:
            if s[:n] * (size//n) == s:
                return s[:n]

This answer seems similar to a few other answers here, but it has a few speed optimisations others have not used:

  • xrange is a little faster in this application,
  • if an input string is an odd length, do not check any even length substrings,
  • by using s[:n] directly, we avoid creating a variable in each loop.

I would be interested to see how this performs in the standard tests with common hardware. I believe it will be well short of David Zhang's excellent algorithm in most tests, but should be quite fast otherwise.

I found this problem to be very counter-intuitive. The solutions I thought would be fast were slow. The solutions that looked slow were fast! It seems that Python's string creation with the multiply operator and string comparisons are highly optimised.

share|improve this answer
    
Not bad at all :-) The benchmark runs on Python 3.4 (partly because OP doesn't specify a version and that's what everyone should be using, and partly because it uses the new statistics module), so I had to change your /s to //s and replace xrange() with range() (which behaves like 2.x's xrange() in 3.x). –  Zero Piraeus yesterday
    
Here are the revisions to the benchmark, so you can review my changes, by the way. –  Zero Piraeus yesterday
    
Thanks Zero. That was fast. The results were slightly down on my predictions. I suspect the techniques I used for speed in Python 2.7 are not very effective in Python 3.4. Oh, well - a fun and educational exercise. –  CarpetPython yesterday
    
// in 3.x is integer division (just like the 2.x behaviour of /), while 3.x's / is float division (which I'm sure would be much slower even if it didn't break your solution by causing an attempt to use a float as an index). As mentioned, 3.x's range() is the same thing as 2.x's xrange(); no equivalent of 2.x's range() exists in 3.x. So I don't think that's the cause of any discrepancy between the benchmark and any timings you made. It's probably just that 3.x is slower overall than 2.x (or maybe your machine is faster than mine). –  Zero Piraeus yesterday
    
When I get some time, I shall have a close look at the run-time differences between Python 2 and Python 3. –  CarpetPython yesterday

I think the fastest solution would be to find a biggest common prefix-suffix of the given string. Then the string is made of at least 2 repeats when both of the following conditions are met:

  1. length of the biggest common prefix-suffix is at least half of the length of the string (prefixLength*2 >= stringLength)

  2. length of the prefix (or suffix) has a common divisor with string length. (stringLength % (stringLength - prefixLength) == 0)

Sorry for not giving a Python code, but it shouldn't be hard to rewrite.

#include <iostream>
#include <vector>
#include <string>

std::string biggestPrefixSuffix(const std::string& string) //using Morris-Pratt algorithm
{
    int stringLength = string.size();
    std::vector<int> pi(stringLength + 1);
    int t = -1;
    pi[0] = -1;

    for(int i = 1; i <= stringLength; ++i)
    {
        while((t > -1) && (string[t] != string[i - 1]))
        {
            t = pi[t];
        }
        ++t;
        pi[i] = t;
    }
    return std::string(string, 0, pi.back());
}
int main()
{
    std::string strings[] =
    {

        "0045662100456621004566210045662100456621",             //# '00456621'
        "0072992700729927007299270072992700729927",             //# '00729927'
        "001443001443001443001443001443001443001443",           //# '001443'
        "037037037037037037037037037037037037037037037",        //# '037'
        "047619047619047619047619047619047619047619",           //# '047619'
        "002457002457002457002457002457002457002457",           //# '002457'
        "001221001221001221001221001221001221001221",           //# '001221'
        "001230012300123001230012300123001230012300123",        //# '00123'
        "0013947001394700139470013947001394700139470013947",    //# '0013947'
        "001001001001001001001001001001001001001001001001001",  //# '001'
        "001406469760900140646976090014064697609",              //# '0014064697609'

        "004608294930875576036866359447",
        "00469483568075117370892018779342723",
        "004739336492890995260663507109",
        "001508295625942684766214177978883861236802413273",
        "007518796992481203",
        "0071942446043165467625899280575539568345323741",
        "0434782608695652173913",
        "0344827586206896551724137931",
        "002481389578163771712158808933",
        "002932551319648093841642228739",
        "0035587188612099644128113879",
        "003484320557491289198606271777",
        "00115074798619102416570771",
    };
    for(const auto& s : strings)
    {
        std::string prefixSuffix = biggestPrefixSuffix(s);
        std::cout << s << " --> ";
        if(s.length() % (s.length() - prefixSuffix.length()) == 0 && prefixSuffix.length() * 2 >= s.length())
        {
            std::cout << std::string(s, prefixSuffix.length());
        }
        else
        {
            std::cout << "string does not repeat";
        }
        std::cout << '\n';
    }
    return 0;
}

outputs:

0045662100456621004566210045662100456621 --> 00456621
0072992700729927007299270072992700729927 --> 00729927
001443001443001443001443001443001443001443 --> 001443
037037037037037037037037037037037037037037037 --> 037
047619047619047619047619047619047619047619 --> 047619
002457002457002457002457002457002457002457 --> 002457
001221001221001221001221001221001221001221 --> 001221
001230012300123001230012300123001230012300123 --> 00123
0013947001394700139470013947001394700139470013947 --> 0013947
001001001001001001001001001001001001001001001001001 --> 001
001406469760900140646976090014064697609 --> 0014064697609
004608294930875576036866359447 --> string does not repeat
00469483568075117370892018779342723 --> string does not repeat
004739336492890995260663507109 --> string does not repeat
001508295625942684766214177978883861236802413273 --> string does not repeat
007518796992481203 --> string does not repeat
0071942446043165467625899280575539568345323741 --> string does not repeat
0434782608695652173913 --> string does not repeat
0344827586206896551724137931 --> string does not repeat
002481389578163771712158808933 --> string does not repeat
002932551319648093841642228739 --> string does not repeat
0035587188612099644128113879 --> string does not repeat
003484320557491289198606271777 --> string does not repeat
00115074798619102416570771 --> string does not repeat

@edit I just realized that RiaD may be talking about the same approach.

share|improve this answer
    
Yes, I think this may be the same as RiaD's solution (which was extremely slow for many cases) - and, yes, the question was for a Python solution. –  TigerhawkT3 yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.