2
\$\begingroup\$

Is there a way I can link/associate an object (in this particular case a function) with a regular expression pattern?

Here's the function I'm working on:

def get_command(url):
    """
    Return command for downloading a particular url.

    >>> get_command('http://randomsite.com/XYZ')
    ['wget', '--content-disposition', '--continue', '--quiet', 'http://randomsite.com/XYZ']

    >>> get_command('http://examplesite.com/XYZ')
    ['/usr/bin/torify', 'wget', '--content-disposition', '--continue', '--quiet', '--referer=http://referersite.com', 'http://examplesite.com/XYZ']

    >>> get_command('https://mega.nz/%23!xd432Afds')
    ['megadl', '--no-progress', 'https://mega.nz/#!xd432Afds']

    >>> get_command('https://mega.nz/#!s2JHj1fds')
    ['megadl', '--no-progress', 'https://mega.nz/#!s2JHj1fds']

    >>> get_command('http://othersite.com/XYZ')
    ['downloadtool', 'http://othersite.com/XYZ']
    """
    import re

    base = ['wget', '--content-disposition', '--continue', '--quiet']

    # examplesite.com via torified wget and special referer
    if re.match(r'(https?://)?examplesite\.com/.+$', url):
        return ['/usr/bin/torify'] + base + \
            ['--referer=http://referersite.com', url]
    # MEGA via megadl of megatools
    elif re.match(r'https://mega\.nz/.+$', url):
        # fix url if necessary
        return ['megadl', '--no-progress', url.replace('%23', '#', 1)]
    # othersite.com via a dedicated download tool
    elif re.match(r'(https?://)?othersite\.com/.+$', url):
        return ['downloadtool', url]
    # default wget command
    else:
        return base + [url]

I think the aforementioned code is pretty straightforward, but it got me thinking whether there's a good way to refactor this. As the if-chain of re.match(pattern, url) gets longer, it starts to look more and more necessary.

Perhaps ideally I'd have a dictionary of patterns with their associated functions. Anyways here's what I ended up with:

def get_command(url):
    """Return command for downloading a particular url."""
    import re

    base = ['wget', '--content-disposition', '--continue', '--quiet']

    commands = [
        # examplesite.com via torified wget and special referer
        (r'(https?://)?examplesite\.com/.+$',
         lambda u: ['/usr/bin/torify'] + base +
         ['--referer=http://referersite.com', u]),
        # MEGA via megadl of megatools (fix url if necessary)
        (r'https://mega\.nz/.+$',
         lambda u: ['megadl', '--no-progress', u.replace('%23', '#', 1)]),
        # othersite.com via a dedicated download tool
        (r'(https?://)?othersite\.com/.+$',
         lambda u: ['downloadtool', u])
    ]

    return next(
        (f for p, f in commands if re.match(p, url)), lambda u: base + [u]
    )(url)

Doesn't seem like much of an improvement, potentially the opposite. Anyone have a good way to refactor this?

\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

You should be able to see that you do lambda u: data + [u] on all of the commands. The exception to this is when you do u.replace('%23', '#', 1), this can be replaced by urllib.unquote and will allow for more urls.

You should put base, commands and re into global scope. If you also change the commands so that you don't use lambda's and just have the lists then you can get:

try:
    #Python2
    from urllib import unquote
except ImportError:
    #Python3
    from urllib.parse import unquote
import re

BASE = ['wget', '--content-disposition', '--continue', '--quiet']
COMMANDS = [
    (
        r'(https?://)?examplesite\.com/.+$',
        ['/usr/bin/torify'] + BASE + ['--referer=http://referersite.com']
    ),
    (
        r'https://mega\.nz/.+$',
        ['megadl', '--no-progress']
    ),
    (
        r'(https?://)?othersite\.com/.+$',
        ['downloadtool']
    )
]

def get_command(url):
    """Return command for downloading a particular url."""
    url = unquote(url)
    commands = next((f for p, f in COMMANDS if re.match(p, url)), BASE)
    return commands + [url]
\$\endgroup\$
1
  • \$\begingroup\$ Awesome, thanks. The unquote really did the trick! \$\endgroup\$
    – Six
    Commented Apr 12, 2016 at 13:04

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.