Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

Hi, I have this string in Python:

'Every Wednesday and Friday, this market is perfect for lunch! Nestled in the Minna St. tunnel (at 5th St.), this location is great for escaping the fog or rain. Check out live music every Friday.\r\n\r\nLocation: 5th St. @ Minna St.\r\nTime: 11:00am-2:00pm\r\n\r\nVendors:\r\nKasa Indian\r\nFiveten Burger\r\nHiyaaa\r\nThe Rib Whip\r\nMayo & Mustard\r\n\r\n\r\nCATERING NEEDS? Have OtG cater your next event! Get started by visiting offthegridsf.com/catering.'

I need to extract the following:

Location: 5th St. @ Minna St.
Time: 11:00am-2:00pm

Vendors:
Kasa Indian
Fiveten Burger
Hiyaaa
The Rib Whip
Mayo & Mustard

I tried to do this by using:

val = desc.split("\r\n")

and then val[2] gives the location, val[3] gives the time and val[6:11] gives the vendors. But I am sure there is a nicer, more efficient way to do this.

Any help will be highly appreciated.

share|improve this question
    
I think you've got it right, actually. I assume this is part of a more general problem? ie, are there always going to be 5 vendors? Are there going to possibly be additional lines before the third, so that the time would be val[?]. Otherwise, you've got it right. – audiodude Feb 11 '14 at 0:43
up vote 1 down vote accepted

If your input is always going to formatted in exactly this way, using str.split() is preferable. If you want something slightly more resilient, here's a regex approach, using re.VERBOSE and re.DOTALL:

import re

desc_match = re.search(r'''(?sx)
    (?P<loc>Location:.+?)[\n\r]
    (?P<time>Time:.+?)[\n\r]
    (?P<vends>Vendors:.+?)(?:\n\r?){2}''', desc)

if desc_match:
    for gname in ['loc', 'time', 'vends']:
        print desc_match.group(gname)

Given your definition of desc, this prints out:

Location: 5th St. @ Minna St.
Time: 11:00am-2:00pm

Vendors:
Kasa Indian
Fiveten Burger
Hiyaaa
The Rib Whip
Mayo & Mustard

Efficiency really doesn't matter here because the time is going to be negligible either way (don't optimize unless there is a bottleneck.) And again, this is only "nicer" if it works more often than your solution using str.split() - that is, if there are any possible input strings for which your solution does not produce the correct result.

If you only want the values, just move the prefixes outside of the group definitions (a group is defined by (?P<group_name>...))

r'''(?sx)
    Location: \s* (?P<loc>.+?)   [n\r]
    Time:     \s* (?P<time>.+?)  [\n\r]
    Vendors:  \s* (?P<vends>.+?) (?:\n\r?){2}'''
share|improve this answer
    
Thanks, this works better (i.e. more general). Do you know how I can use regex further to extract each of the values. I want to store the time, location and vendors in my model. I can do a split on ":" but then the time case won't work. And I want to make it part of a nested loop. Thanks! – user2216194 Feb 13 '14 at 5:21
    
Edited my answer. What is causing difficulty with nesting this inside a loop? It's not inefficient to repeatedly call re.search - Python keeps a cache of regular expressions so that it does not have to repeatedly compile the same one. – GVH Feb 13 '14 at 7:26
NLNL = "\r\n\r\n"

parts = s.split(NLNL)
result = NLNL.join(parts[1:3])
print(result)

which gives

Location: 5th St. @ Minna St.
Time: 11:00am-2:00pm

Vendors:
Kasa Indian
Fiveten Burger
Hiyaaa
The Rib Whip
Mayo & Mustard
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.