parsing string - regex help in python

Question

Hi, I have this string in Python:

'Every Wednesday and Friday, this market is perfect for lunch! Nestled in the Minna St. tunnel (at 5th St.), this location is great for escaping the fog or rain. Check out live music every Friday.\r\n\r\nLocation: 5th St. @ Minna St.\r\nTime: 11:00am-2:00pm\r\n\r\nVendors:\r\nKasa Indian\r\nFiveten Burger\r\nHiyaaa\r\nThe Rib Whip\r\nMayo & Mustard\r\n\r\n\r\nCATERING NEEDS? Have OtG cater your next event! Get started by visiting offthegridsf.com/catering.'

I need to extract the following:

Location: 5th St. @ Minna St.
Time: 11:00am-2:00pm

Vendors:
Kasa Indian
Fiveten Burger
Hiyaaa
The Rib Whip
Mayo & Mustard

I tried to do this by using:

val = desc.split("\r\n")

and then val[2] gives the location, val[3] gives the time and val[6:11] gives the vendors. But I am sure there is a nicer, more efficient way to do this.

Any help will be highly appreciated.

I think you've got it right, actually. I assume this is part of a more general problem? ie, are there always going to be 5 vendors? Are there going to possibly be additional lines before the third, so that the time would be val[?]. Otherwise, you've got it right. — audiodude, Feb 11 '14 at 0:43

GVH · Accepted Answer · 2014-02-13 07:24:21Z

If your input is always going to formatted in exactly this way, using str.split() is preferable. If you want something slightly more resilient, here's a regex approach, using re.VERBOSE and re.DOTALL:

import re

desc_match = re.search(r'''(?sx)
    (?P<loc>Location:.+?)[\n\r]
    (?P<time>Time:.+?)[\n\r]
    (?P<vends>Vendors:.+?)(?:\n\r?){2}''', desc)

if desc_match:
    for gname in ['loc', 'time', 'vends']:
        print desc_match.group(gname)

Given your definition of desc, this prints out:

Location: 5th St. @ Minna St.
Time: 11:00am-2:00pm

Vendors:
Kasa Indian
Fiveten Burger
Hiyaaa
The Rib Whip
Mayo & Mustard

Efficiency really doesn't matter here because the time is going to be negligible either way (don't optimize unless there is a bottleneck.) And again, this is only "nicer" if it works more often than your solution using str.split() - that is, if there are any possible input strings for which your solution does not produce the correct result.

If you only want the values, just move the prefixes outside of the group definitions (a group is defined by (?P<group_name>...))

r'''(?sx)
    Location: \s* (?P<loc>.+?)   [n\r]
    Time:     \s* (?P<time>.+?)  [\n\r]
    Vendors:  \s* (?P<vends>.+?) (?:\n\r?){2}'''

Thanks, this works better (i.e. more general). Do you know how I can use regex further to extract each of the values. I want to store the time, location and vendors in my model. I can do a split on ":" but then the time case won't work. And I want to make it part of a nested loop. Thanks! — user2216194, Feb 13 '14 at 5:21
Edited my answer. What is causing difficulty with nesting this inside a loop? It's not inefficient to repeatedly call re.search - Python keeps a cache of regular expressions so that it does not have to repeatedly compile the same one. — GVH, Feb 13 '14 at 7:26

Hugh Bothwell · Answer 2 · 2014-02-11 00:45:05Z

up vote 1 down vote

NLNL = "\r\n\r\n"

parts = s.split(NLNL)
result = NLNL.join(parts[1:3])
print(result)

which gives

Location: 5th St. @ Minna St.
Time: 11:00am-2:00pm

Vendors:
Kasa Indian
Fiveten Burger
Hiyaaa
The Rib Whip
Mayo & Mustard

answered Feb 11 '14 at 0:45

Hugh Bothwell

32.6k32754

add a comment |

asked	2 years ago
viewed	55 times
active	2 years ago

current community

your communities

more stack exchange communities

parsing string - regex help in python

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python regex django string or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

parsing string - regex help in python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python regex django string or ask your own question.

Related

Hot Network Questions