Simple Batch Templating Utility in Python

Question

I would like to present for review my (much) revised batch templating utility which had it's humble beginnings here in a previous post. As I mentioned there, this program is my entry into python programming. I am trying to grow the simple script from my previous post into a more robust utility.

Questions I am hoping to answer with this post:

Is the overall structure sound?
Is my use of exceptions correct?
Is my documentation OK? This was my first intro into docstrings and I have worked hard to make them as complete as possible.
One part that bugs me is the try block in the main() function.
- First, this whole block should probably be a separate function? I left it in main() since it's the meat of the program.
- Secondly, there seems to be a lot of code between the try: and the except: and I know this should be minimized, but I couldn't come up with a better method.

Program Inputs The program takes as inputs two required files, a CSV data file and a template file and an optional appended file. The output of the program is a set of rendered files, one for each data row in the CSV file. The CSV data file contains a header row which is used as a set keys mapped to tags inside of the template file. For every row in the data file, each data item associated with the keys is substituted with the tag in the template file, the appended file added (with some added tags for *.js files) and the rendered file written to disk. Pretty straight forward I think. The main docstring illustrates a quick example.

Template Syntax The program uses Python's string.Template() string substitution method which utilizes the $ replacement syntax with the added requirement of mandating the optional (to the method) { and } curly braces. So, for a particular Key from the data file header row, the template tag would be ${Key}.

Wall of Code I think the docstrings explain pretty well what all is going on...

"""
A simple batch templating utility for Python.

Initially conceived as a patch to quickly generate small HTML files from
catalog data.  The program takes as inputs two (2) required files, a CSV
data file and a template file (see below) and the option to append a
third file.  Output of the program is a set of rendered files, one for
each data row in the CSV data file.

USAGE:
    Current rendition of program uses a simple guided prompt interface
    to walk user through process.

**SPECIAL WARNING**
    This program copies template file and appended file to strings which
    means they will both be loaded fully into memory.  Common sense
    should be exercised when dealing with extremely large files.

CSV DATA FILE:
    Data File shall contain a header row.  Header row contains the keys
    that will be used to render the output files.  Keys shall not
    contain spaces.  There shall be a corresponding tag in the template
    file for each key in the CSV Data File.

    File can contain any (reasonable) number of data rows and columns.
    Each item in a row is swapped out with the tag in the template file
    which corresponds to appropriate key from the header row.  There
    will be one output file generated for each row in the data file.

TEMPLATE FILE:
    The template file is basically a copy of the desired output file
    with tags placed wherever a particular piece of data from the CSV
    Data File should be placed in the output.

    Syntax:
    The program uses Python's string.Template() string substitution
    method which utilizes the `$` replacement syntax.  The program
    further restricts the syntax requiring the use of the optional `{`
    and `}` curly braces surrounding tags. So, for a particular 'Key'
    from the data file header row, the template tag would be ${Key}.

APPENDED FILE:
    The appended file is strictly copied _ver batum_ to the end of the
    rendered output file.  There is really no restriction on the
    appended file other than special warning above.

    Special Feature:
    If the appended file is a Javascript file (detected using the *.js
    file extension), the program will add appropriate opening and
    closing HTML tags.

QUICK EXAMPLE:
    Assume CSV Data File: <some_file.csv>
        stockID,color,material,url
        340,Blue,80% Wool / 20% Acrylic,http://placehold.it/400
        275,brown,100% Cotton,http://placehold.it/600

    Assume Template File: <another_file.html>
        <h1>Stock ID: ${stockID}</h1>
        <ul>
            <li>${color}</li>
            <li>${material}</li>
        </ul>
        <img src='${url}'>

    Assume ...Appended File? --> No

    Output file 1 = 'listing-340.html'
        <h1>Stock ID: 340</h1>
        <ul>
            <li>Blue</li>
            <li>80% Wool / 20% Acrylic</li>
        </ul>
        <img src='http://placehold.it/400'>

    Output file 2 = 'listing-340.html'
        <h1>Stock ID: 275</h1>
        <ul>
            <li>brown</li>
            <li>100% Cotton</li>
        </ul>
        <img src='http://placehold.it/600'>


Author: Chris E. Pearson (christoper.e.pearson.1 at gmail dot com)
Copyright (c) Chris E. Pearson, 2015
License: TBD
"""

import os
import re
import csv
import string


def main():
    """
    A simple batch templating utility for Python.

    See main docstring for details.
    """
    # Collect input file names and contents for text files.
    fname_data = prompt_filename('Data File')
    fname_template = prompt_filename('Template File')
    fcontents_template = get_contents(fname_template)
    fname_appended, fcontents_appended = get_appended()

    # Validate the inputs
    tag_set = set(re.findall('\${(\S+)}', fcontents_template))
    primary_key, key_set = get_keys(fname_data)
    validate_inputs(tag_set, key_set)
    validated_template = string.Template(fcontents_template)
    # Generate the output
    try:
        # This seems like a lot to put in a try statement...?
        with open(fname_data) as f:
            reader = csv.DictReader(f)
            f_count = 0
            for row in reader:
                # Create output filename
                output_filename = ('Listing_{}.html'.format(row[primary_key]))
                f_count += 1
                print('File #{}: {}'.format(f_count, output_filename))
                # Prep string
                output_main = validated_template.substitute(row)
                write_string = '{}{}'.format(output_main, fcontents_appended)
                # Write File
                with open(output_filename, 'w') as f_out:
                    f_out.write(write_string)
    except OSError:
        print('No such file {!r}.  Check file name and path and try again.'
              .format(fname))
        raise
    else:
        print('{} of {} files created'.format(str(f_count),
              str(reader.line_num-1)))


def prompt_filename(fclass):
    """
    Prompt user for a filename for given file classification.

    Args:
        fclass (string):
        A descriptive string describing the type of file for which the
        filename is requested. _e.g._ 'Template File'

    Returns:
        filename (string)
    """
    while True:
        filename = input('Enter {0} --> '.format(fclass))
        if os.path.isfile(filename):
            return filename
        else:
            print('No such file: {!r}.'.format(filename))
            print('Please enter a valid file name')
            continue


def get_contents(fname):
    """
    Return contents of file `fname` as a string if file exists.

    Args:
        fname (string):
        Name of the file to be opened and returned as a string.

    Returns:
        text_file (string):
        The entire contents of `fname` read in as a string.

    Exceptions:
        OSError: informs user that fname is invalid.
    """
    try:
        with open(fname) as f:
            text_file = f.read()
    except OSError:
        print('No such file {!r}.  Check file name and path and try again.'
              .format(fname))
        raise
    else:
        return text_file


def get_appended():
    """
    Ask user if appended file and prompt filename if so.

    Returns:
        fname_appended (string)
        Filename for appended file.

        fcontents_appended (string)
        The entire contents of `fname_appended` as a string.

    Exceptions:
        OSError: Raised by function prompt_filename informs user that
        fname is invalid.

    See Also:
        Function: prompt_filename
        Function: get_contents
    """
    prompt_for_appended = input('Is there an appended file? --> ')
    if prompt_for_appended.lower().startswith('y'):
        fname_appended = prompt_filename('Appended File')
        fcontents_appended = get_contents(fname_appended)
        if fname_appended.lower().endswith('.js'):
            open_tag = '<script type="text/javascript">'
            close_tag = '</script>'
            fcontents_appended = '\n{0}\n{1}\n{2}'.format(open_tag,
                                                          fcontents_appended,
                                                          close_tag)
    else:
        fname_appended = None
        fcontents_appended = ''

    return fname_appended, fcontents_appended


def get_keys(fname):
    """
    Get key set as header row of given CSV file and get primary key.

    Given a CSV data file `fname`, return the header row from file
    as a set of "keys".  Also return the primary key for the data file.
    The primary key is simply the header for the first column.

    Args:
        fname (string):
        Name of the CSV file for which the keys are needed.

    Returns:
        primary_key (string)
        Header value of first column in given CSV file.

        key_set (set of strings)
        A set comprised of all header row values for given CSV file.

    Exceptions:
        OSError: informs user that fname is invalid.
    """
    try:
        with open(fname) as f:
            key_list = f.readline().strip().split(',')
    except OSError:
        print('No such file {!r}.  Check file name and path and try again.'
              .format(fname))
        raise
    else:
        primary_key = key_list[0]
        key_set = set(key_list)
        return primary_key, key_set


def validate_spaces(item_set):
    """
    Read through a set of strings and checks for spaces.

    The function takes a set of strings and searches through each string
    looking for spaces.  If a space is found, string is appended to a
    list.  Once all strings are searched, if any spaces found, print
    error with generated list and terminate program.

    Args:
        item_set (set of strings)

    Returns:
        None

    Exceptions:
        A `KeyingError` is raised if any spaces are detected in the data
        file key set.
    """
    bad_items = []
    for item in item_set:
        if ' ' in item:
            bad_items.append(item)
    if bad_items != []:
        try:
            raise KeyingError('Keys cannot contain spaces.')
        except KeyingError as e:
            print(e)
            print('Please correct these keys:\n', bad_items)
            # quit()
            raise


def validate_inputs(tag_set, key_set):
    """
    Validate template tag_set against data file key_set.

    Validates the key_set from a given data file against the tag_set
    from the corresponding template file, first checking the key set for
    lack of spaces and then checking if the two sets are equivalent. If
    either condition is not met, an exception will be raised and the
    program will terminate.

    Args:
        tag_set (set of strings)

        key_set (set of strings)

    Returns:
        None

    Exceptions:
        A `KeyingError` is raised by function `validate_spaces` if any
        spaces are detected in the data file key set.

        A `MisMatchError` is raised if the two input sets are not
        equivalent.

    See also:
        Function: validate_spaces
    """
    try:
        validate_spaces(key_set)
    except KeyingError as e:
        print('Goodbye')
        quit()
    if key_set != tag_set:
        try:
            raise MisMatchError('Tags and keys do not match')
        except MisMatchError as e:
            print(e)
            if tag_set - key_set == set():
                print('missing tags for key(s):', key_set - tag_set)
                print('(or tag(s) contains spaces)')
            else:
                print('Check template file tags for key(s):',
                      key_set - tag_set)
                print('Template shows:', tag_set - key_set)
            print('Goodbye')
            quit()


class KeyingError(Exception):
    def __init__(self, arg):
        self.arg = arg


class MisMatchError(Exception):
    def __init__(self, arg):
        self.arg = arg


if __name__ == '__main__':

    main()

tsleyson · Answer 1 · 2015-03-20 08:12:29Z

Regarding the code inside the try block in main, rather than trying to minimize that code, I would think about what other exceptions you might be able to catch and handle at that point. Try doing some of the other calls from that block of code in the interactive prompt, with invalid data, and see what they toss at you. I don't think it's too much code anyway. I wouldn't put it in a separate function; while it may be several lines of code, it feels "mainy". As you say, it's the meat, or maybe more like the backbone: it provides the structure that binds together all the other code. I think it's fine where it is.

I see some issues with your use of exceptions. Regarding the code inside prompt_filename:

def prompt_filename(fclass):
    """
    Prompt user for a filename for given file classification.

    Args:
        fclass (string):
        A descriptive string describing the type of file for which the
        filename is requested. _e.g._ 'Template File'

    Returns:
        filename (string)
    """
    while True:
        filename = input('Enter {0} --> '.format(fclass))
        if os.path.isfile(filename):
            return filename
        else:
            print('No such file: {!r}.'.format(filename))
            print('Please enter a valid file name')
            continue

There's a precept in Python, EAFP, which stands for "Easier to Ask Forgiveness than Permission". What it means is that Python programmers tend not to check things with conditionals, like doing if os.path.isfile(filename). The style in Python is more to assume everything is good, and let the program throw an exception if it's not good. In this case, if there's no such file, you throw IOError and complain. Sometimes you do want to ask permission, but I think this is a case where it's easier to ask forgiveness.

This is something I also see elsewhere in your code. It's good to be safe, but in Python, people tend to really lean more heavily on exceptions than on explicit conditional checks in most cases. The case where you do use a conditional is when there are multiple possibilities, all of which are valid, and you need to figure out which case you're in. But if something is wrong or invalid or unexpected, like the passed file name not being a real file, I recommend exceptions.

(By the way, you don't need continue in your else clause. What continue does is skip over any code that comes after it to go on to the next iteration of the loop. In this case, there is no code after continue, so it would always go to the next iteration anyway.)

In both get_keys and get_contents, you have some code like

try:
    with open(fname) as f:
        text_file = f.read()
except OSError:
    print('No such file {!r}.  Check file name and path and try again.'
          .format(fname))
    raise

Rather than print a message from inside this function, I would probably re-raise with a new message:

except OSError:
    raise OSError('No such file {!r}.  Check file name and path and try again.'.format(fname))

For the most part, I don't believe in catching exceptions unless you're going to do something about them. But re-throwing with more specific info is a perfectly valid thing to do. Also, I don't like to have functions other than main printing to the console. I'd prefer to re-throw with a new message, catch the exception in main, and print the message.

Related to this, I see the following code in validate_inputs:

try:
    validate_spaces(key_set)
except KeyingError as e:
    print('Goodbye')
    quit()

I would prefer not to catch the KeyingError here. Letting an exception go uncaught will just stop the whole program, which seems to be what you wanted. Some other languages force you to catch or declare every exception, but Python will just bring down the whole program around you. That's not what you want for production code, but that is absolutely what you want for development: anything anomalous will make the program crash and die right away, with a reference to the line number where the crashing and dying occurred. As a bonus, it's quicker and easier to write the code that way, because you don't have to add try/except blocks around everything. If you can do something about the invalid input, then definitely catch the exception and do something. But if all you can do is say "You screwed up, fix it", then why not just let the exception be thrown?

This piece of code from validate_spaces could be a lot shorter and cleaner:

if bad_items != []:
    try:
        raise KeyingError('Keys cannot contain spaces.')
    except KeyingError as e:
        print(e)
        print('Please correct these keys:\n', bad_items)
        # quit()
        raise

I think it would look better like this:

if bad_items:
   raise KeyingError("Keys cannot contain spaces. Please correct: {}".format(bad_items))

The empty list is falsey, while the non-empty list is truthy, so writing if bad_items is equivalent to testing if the bad_items list is non-empty.
There's just not much point in throwing and catching an exception in the same function. If you really wanted to print a message and die inside this function, just do it, without throwing an exception. But in an application like this, I prefer my error handling to be mostly in main. It's just easier if your application has a single valid exit point. For example, if the application has multiple exit points and you have a bug where it's exiting anomalously, then to figure out why, you have to monitor all of those exit points. You might not even realize right away that it exited anomalously.

A program like this, that seems to be a command line utility, is probably better served by taking command line arguments than by interactively reading filenames. That's the next direction I'd go in. The simple way to do this is to read sys.argv. If you've ever done bash, sys.argv[0] is the script name, just like $0, and sys.argv[1:] are the positional arguments passed on the command line, just like $1, $2, etc.:

python templater.py some_file.csv another_file.html

would call templater.py with "some_file.csv" as the value of sys.argv[1] and "another_file.html" as the value of sys.argv[2].

The more complex way to do this is to use the argparse module from the standard library. If you're sticking with all positional arguments, then reading sys.argv directly is probably fine. You can do something like

try: 
    append_file = sys.argv[3]
except IndexError:
    pass  # Optional argument not passed

to check whether optional arguments were passed. But argparse really shows its value if you want to have options and switches, which are a pain with sys.argv and near impossible with interactive input. (You have to either have a config file somewhere, or annoy the user every time with "Do you want gold-plating? (y/n)".)

To end on a positive note, I have nothing but good things to say about your use of docstrings, especially the ones on your functions. This is exactly the kind of excellent docstring that Clojure has and that Python mostly lacks, at least in the standard library.

This is great stuff. Just what I needed. I am definitely going to work on the error handling. Also, I think the argv will work great. Hadn't gottent to that section yet in the library (it's section 29). I am also going to look at the argparse module as I may want to add some switches like a -v verbose mode or maybe change syntax for specific file types (the ${} syntax certainly doesn't play nice with javascript). Thx! — Christopher Pearson, Mar 20 at 18:25

asked	1 month ago
viewed	34 times
active	1 month ago

current community

your communities

more stack exchange communities

Simple Batch Templating Utility in Python

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python beginner template or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Simple Batch Templating Utility in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python beginner template or ask your own question.

Linked

Related

Hot Network Questions