Sign up ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free.

I am writing a python library to parse different working hours string and produce the standard format of hours. I stuck in the following case:

My regex should return groups for Mon - Fri 7am - 5pm Sat 9am - 3pm as ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm'] but if there is a comma between first and second then it should return [].

Also the comma can be in anywhere but should not between the two weekdays & duration. eg: Mon - Fri 7am - 5pm Sat 9am - 3pm and available upon email, phone call should return ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm'].

This is what I have tried,

import re
pattern = """(
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?  # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)"""

regEx = re.compile(pattern, re.IGNORECASE|re.VERBOSE)

print re.findall(regEx, "Mon - Fri 7am - 5pm Sat 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm Sat - Sun 9am - 3pm")
# output ['Mon - Fri 7am - 5pm ', 'Sat - Sun 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm, Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm,', 'Sat 9am - 3pm']
print re.findall(regEx, "Mon - Fri 7am - 5pm , Sat 9am - 3pm")
# expected output []
# but I get ['Mon - Fri 7am - 5pm ', 'Sat 9am - 3pm']

Also I tried negative look ahead pattern in my regex

pattern = """(
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs)
\s*[-|to]+\s*
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?
\s*[from]*\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
\s*[-|to]+\s*
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?)
(?![^,])
)"""

But I didnt get expected one. Should I explicitly write code for checking condition? Is there any way to just changing my regex instead of writing explicit condition checking?

Another way I like to implement is infix the comma between two weekday- duration if comma doesn't exist and change my regex to group by/split by comma. "Mon - Fri 7am - 5pm Sat 9am - 3pm" => "Mon - Fri 7am - 5pm, Sat 9am - 3pm"

share|improve this question
    
How about you remove the comma? Wouldn't that be easier? Filter out the comma before sending it to re. – CppLearner Feb 7 '13 at 9:24
    
I require comma for further processing, so I would like to add comma between two week-duration if "Mon - Fri 7am - 5pm Sat 9am - 3pm". I will edit my question now. "further processsing"- I already have a parser to standardize the string if comma exists. – underscore Feb 7 '13 at 9:40

3 Answers 3

I think that you can doing it simply by matching the whole expression so that comma (and other characters are not allowed :

pattern = """^(
(
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
\s*[-|to]+\s* # Seperator
(?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?  # End weekday
\s*[from]*\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
\s*[-|to]+\s* # Seperator
(?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close hour
)
)+$""

This will output :

[('Sat 9am - 3pm', 'Sat 9am - 3pm')]
[('Sat - Sun 9am - 3pm', 'Sat - Sun 9am - 3pm')]
[]
[]

Hope it helps,

share|improve this answer
    
Did you notice it both the items in the array? – underscore Feb 7 '13 at 10:17
    
Yes I'm trying to find a better solution :) (actually I can't figure out why it doesn't work when you remove the first enclosing parenthesis...) – Y__ Feb 7 '13 at 10:19

I wrote few lines of code to check and insert comma after every if the comma doesn't exists between two weekday duration. So I could able to get a same format "Mon - Fri 7am - 5pm, Sat 9am - 3pm" and I can proceed further.

share|improve this answer

Couldn't figure how to do that in a single regex, its hard you got a nice question. I could do what you need, but be aware that im not proud of it.

Supossing you have a function to do that...

def sample_funct(unparsed_schedule)
    result = []

    # Day Pattern
    pattern = """
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|m|w|f|thurs) # Start weekday
    \s*[-|to]+\s* # Seperator
    (?:mon|tue|wed|thu|fri|sat|sun|mo|tu|we|th|fr|sa|su|^(?![ap])m|w|f|thurs)?  # End weekday
    \s*[from]*\s* # Seperator
    (?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][\.]?m\.?) # Start hour
    \s*[-|to]+\s* # Seperator
    (?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][\.]?m\.?) # Close hour
    """

    # No commas pattern
    pattern2 = "%s\s*[^,]\s*%s" % (pattern, pattern)

    # Actual Regex Pattern Items
    schedule     = re.compile(pattern, re.IGNORECASE|re.VERBOSE)
    remove_comma = re.compile(pattern2, re.IGNORECASE|re.VERBOSE)

    # Check we have no commas in the middle
    valid_result = re.search(remove_comma, unparsed_schedule)
    if valid_result:
        # Positive result, return the list with schedules
        result = re.findall(schedule, validresult.group(0))

    # If no valid results will return empty list
    return result 
share|improve this answer
    
Thanks! I made the string to unique format i.e, inserted commas for which has no comma. – underscore Feb 14 '13 at 8:06

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.