Is there a way I can link/associate an object (in this particular case a function) with a regular expression pattern?
Here's the function I'm working on:
def get_command(url):
"""
Return command for downloading a particular url.
>>> get_command('http://randomsite.com/XYZ')
['wget', '--content-disposition', '--continue', '--quiet', 'http://randomsite.com/XYZ']
>>> get_command('http://examplesite.com/XYZ')
['/usr/bin/torify', 'wget', '--content-disposition', '--continue', '--quiet', '--referer=http://referersite.com', 'http://examplesite.com/XYZ']
>>> get_command('https://mega.nz/%23!xd432Afds')
['megadl', '--no-progress', 'https://mega.nz/#!xd432Afds']
>>> get_command('https://mega.nz/#!s2JHj1fds')
['megadl', '--no-progress', 'https://mega.nz/#!s2JHj1fds']
>>> get_command('http://othersite.com/XYZ')
['downloadtool', 'http://othersite.com/XYZ']
"""
import re
base = ['wget', '--content-disposition', '--continue', '--quiet']
# examplesite.com via torified wget and special referer
if re.match(r'(https?://)?examplesite\.com/.+$', url):
return ['/usr/bin/torify'] + base + \
['--referer=http://referersite.com', url]
# MEGA via megadl of megatools
elif re.match(r'https://mega\.nz/.+$', url):
# fix url if necessary
return ['megadl', '--no-progress', url.replace('%23', '#', 1)]
# othersite.com via a dedicated download tool
elif re.match(r'(https?://)?othersite\.com/.+$', url):
return ['downloadtool', url]
# default wget command
else:
return base + [url]
I think the aforementioned code is pretty straightforward, but it got me thinking whether there's a good way to refactor this. As the if-chain of re.match(pattern, url)
gets longer, it starts to look more and more necessary.
Perhaps ideally I'd have a dictionary of patterns with their associated functions. Anyways here's what I ended up with:
def get_command(url):
"""Return command for downloading a particular url."""
import re
base = ['wget', '--content-disposition', '--continue', '--quiet']
commands = [
# examplesite.com via torified wget and special referer
(r'(https?://)?examplesite\.com/.+$',
lambda u: ['/usr/bin/torify'] + base +
['--referer=http://referersite.com', u]),
# MEGA via megadl of megatools (fix url if necessary)
(r'https://mega\.nz/.+$',
lambda u: ['megadl', '--no-progress', u.replace('%23', '#', 1)]),
# othersite.com via a dedicated download tool
(r'(https?://)?othersite\.com/.+$',
lambda u: ['downloadtool', u])
]
return next(
(f for p, f in commands if re.match(p, url)), lambda u: base + [u]
)(url)
Doesn't seem like much of an improvement, potentially the opposite. Anyone have a good way to refactor this?