Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Giving the following example list the task is: find out whether the second expression is in the first one. * shall be considered as a wildcard for any character and one should be able to escape it \*

Hello,ell
This is good, is
CodeEval,C*Eval
Old,Young

should evaluate to

true
true
true
false

I wrote the following code which solves this problem. However, my solution seems very cumbersome and slow. Do you know how to improve it? (The task is for CodeEval and this solution is actually too slow. The actual solution which succeeded is much shorter but it does not check if the search_exprs are in the right order)

import sys
import re

def all_occ(ss, s):
    """ Finds All Occurences of substring ss in string s and returns
    for each occurence its last index
    """
    start = 0
    while True:
        try:
            start = s.index(ss,start)+len(ss)
        except ValueError:
            break
        yield start


def is_ss(s, ss):
    search_exprs = [x.replace('\*', '*') for x in re.split(r'(?<!\\)\*', ss)]

    # Get the list of all occurences of each search expression
    find_indices = [list(all_occ(sub, s)) for sub in search_exprs]

    # If any of the indices is not found, then the expression does not match
    if not any(find_indices):
        return "false"

    # Check if the expressions are in the right order
    start = 0
    for el in find_indices:
        start = min([e for e in el if e>start])
        if not start:
            return "false"

    return "true"

for line in open(sys.argv[1]).read().splitlines():
    if line:
        print is_ss(*line.split(','))

EDIT: Just fyi: here the actual solution that has been accepted by codeeval but would fail if the expressions were in the wrong ordere:

import sys
import re

def is_ss(s, ss):
    search_exprs = [x.replace('\*', '*') for x in re.split(r'(?<!\\)\*', ss)]

    if any(map(lambda x: x in s, search_exprs)):
        return "true"
    else:
        return "false"

for line in open(sys.argv[1]).read().splitlines():
    if line:
        print is_ss(*line.split(','))
share|improve this question
    
Should one be able to escape escapes? –  icktoofay Jun 21 at 23:47
    
No, one should not :) –  ProfHase85 Jun 22 at 8:26
add comment

1 Answer

Your solution seems like an overkill for what is asked. You only need to answer if the second expression is in the first one. In your solution, you are searching for ALL occurrences, no wonder why it takes some extra time :)

For me, the most elegant solution would be like:

def is_ss(s, ss): # the signature can be improved for clarity :)
    return ss in s 

Now, you know you can't do that because of the * cse, so you will need some regex to handle this. I think the function can be as "simple" as (not tested though):

def is_ss(s, ss):
    return re.match(re.sub(some_pattern, some_other_pattern, ss), s) is not None

About this piece of code:

  • re.match returns a MatchObject if it matches something or None if it doesn't.
  • re.sub is the regex-based replace. While it might be possible to use the string replace (with other functions), it will be clearer to do it this way.
  • I'm no regex expert (and it is 7am) so you'll have to figure out some_pattern and some_other_pattern used for the * case yourself, sorry!
share|improve this answer
    
Your aproach was the thing I tried in first place. It was even more cumbersome because: The 'some pattern' would mean to replace * not preceeded by ` by .*. (as *) is the right regex for this. now the problem is: What about all other characters that are possibly in my ss (i.e. ., [], \w`) and so on. They should be escaped as well. re.escape would not help, as it would escape everything. –  ProfHase85 Jun 22 at 8:33
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.