Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am new to Python and I am writing my first utility as a way to learn about strings, files, etc. I am writing a simple utility using string replacement to batch output HTML files. The program takes as inputs a CSV file and an HTML template file and will output an HTML file for each data row in the CSV file.

CSV Input File: test1.csv

The CSV file, which has header row, contains some catalog data, one product per row, like below:

stockID,color,material,url
340,Blue and magenta,80% Wool / 20% Acrylic,http://placehold.it/400
275,Purple,100% Cotton,http://placehold.it/600
318,Blue,100% Polyester,http://placehold.it/400x600

HTML Template Input File: testTemplate.htm

The HTML template file is simply a copy of the desired output with string replace tags %s placed at the appropriate locations:

<h1>Stock ID: %s</h1>
<ul>
    <li>%s</li>
    <li>%s</li>
</ul>
<img src='%s'>

The Python is pretty straight forward I think. I open the template file and store it as a string. I then open the CSV file using the csv.dictreader() command. I then iterate through the rows of the CSV, build the file names and then write the output files using string replacement on the template string using the dictionary keys.

import csv

# Open template file and pass string to 'data'.  Should be in HTML format except with string replace tags.
with open('testTemplate.htm', 'r') as myTemplate:
    data = myTemplate.read()
    # print template for visual cue.
    print('Template passed:\n' + '-'*30 +'\n' + data)
    print('-'*30)

# open CSV file that contains the data and store to a dictyionary 'inputFile'.
with open('test1.csv') as csvfile:
    inputFile = csv.DictReader(csvfile)
    x = 0 # counter to display file count
    for row in inputFile:
        # create filenames for the output HTML files
        filename = 'listing'+row['stockID']+'.htm'
        # print filenames for visual cue.
        print(filename)
        x = x + 1 
        # create output HTML file.
        with open(filename, 'w') as outputFile:
            # run string replace on the template file using items from the data dictionary
            # HELP--> this is where I get nervous because chaos will reign if the tags get mixed up
            # HELP--> is there a way to add identifiers to the tags?  like %s1 =row['stockID'], %s2=row['color'] ... ???
            outputFile.write(data %(row['stockID'], row['color'], row['material'], row['url']))

# print the number of files created as a cue program has finished.
print('-'*30 +'\n' + str(x) + ' files created.')

The program works as expected with the test files I have been using (which is why I am posting here and not on SO). My concern is that it seems pretty fragile. In 'production' the CSV file will contain many more columns (around 30-40) and the HTML will be much more complex, so the chances of one of the tags in the string replace getting mixed seems pretty high. is there a way to add identifiers to the tags? like %s1 =row['stockID'], %s2=row['color'] ...? that could be placed either in the template file or in the write() statement (or both)? Any method alternatives or improvements I could learn would be great (note I am well aware of the Makos and Mustaches of the world and plan to learn a couple of template packages soon.)

share|improve this question
    
Look into proper html templating engine. –  CodesInChaos 4 hours ago
    
Thanks @Codes I do plan to learn a couple of templating packages like I mentioned. Any recommendations? Right now I am thinking of learning Mako and Mustache both just for fun. –  Christopher Pearson 55 mins ago

2 Answers 2

Style

Python has a style guide called PEP8. Among many other great things, it gives guidelines about spacing that you do not follow. Indeed, your spacing seems to be quite inconsistent. You'll find tools such as pep8 to check your compliancy to PEP8 and other tools such as ``autopep8 to fix your code automatically.

It can be a good habit to move the part of your program doing things (by opposition to the part of your program defining things) behind an if __name__ == "__main__" guard.

You can also use tools such as pylint to check your code. Among other things, Python naming convention are now followed.

Don't repeat yourself / avoid magic numbers

I can see 30 in multiples places. This is usually a bad sign : if you ever want to change the value to something else, you'll have to change it in multiple places. You probably should define a constant to hold that value behind a meaningful name.

Even better, you could define a function to perform the particular behavior that you want :

Getting the length the right way

At the moment, you are keeping track of the number of rows in input_file by incrementing a variable x. It is much clearer to simply use len(intput_file). Also, x = x + 1 can simply be written : x += 1.

Taking these various comments into account, you get :

import csv

SIZE_LINE = 30


def print_with_line(s):
    print(s)
    print('-' * SIZE_LINE)


if __name__ == '__main__':

    # Open template file and pass string to 'data'.
    # Should be in HTML format except with string replace tags.
    with open('testTemplate.htm', 'r') as my_template:
        data = my_template.read()
        # print template for visual cue.
        print_with_line('Template passed:')
        print_with_line(data)

    # open CSV file that contains the data and
    # store to a dictyionary 'input_file'.
    with open('test1.csv') as csv_file:
        input_file = csv.DictReader(csv_file)
        for row in input_file:
            # create filenames for the output HTML files
            filename = 'listing' + row['stockID'] + '.htm'
            # print filenames for visual cue.
            print(filename)
            # create output HTML file.
            with open(filename, 'w') as output_file:
                # run string replace on the template file
                # using items from the data dictionary
                # HELP--> this is where I get nervous because
                # chaos will reign if the tags get mixed up
                # HELP--> is there a way to add identifiers to
                # the tags?  like %s1 =row['stockID'], %s2=row['color'] ... ???
                output_file.write(data % (
                    row['stockID'],
                    row['color'],
                    row['material'],
                    row['url']))

    # print the number of files created as a cue program has finished.
    print_with_line(str(len(input_file)) + ' files created.')
share|improve this answer
    
Thanks for the useful info @Josay. Considering this was my FIRST program, I was more focused on figuring things out than worrying about style. Your len(input_file) does not work: Traceback (most recent call last): File "C:\Code\anotherTry.py", line 45, in <module> print_with_line(str(len(input_file)) + ' files created.') TypeError: object of type 'DictReader' has no len() –  Christopher Pearson 15 hours ago
    
Ah! I should have tried >_< I'll try and have a look in a few hours. Sorry for the inconvenience –  Josay 15 hours ago
    
Can you explain what the if __name__ == '__main__':does? –  Christopher Pearson 15 hours ago
1  
There is a link to an explanation. Basically, what's behind only gets executed when your file is used as a script (and not imported as a middle for instance). If you want to write reusable code, you have to use this to be able to import modules without interferences. –  Josay 15 hours ago

Python has a number of templating options, but the simplest to start is probably the string.Template one described in https://docs.python.org/3/library/string.html#template-strings

This supports targets such as $StockId and is used as below

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'

If you need more output options, look at the string.format functionality, but this is probably best for starting with.

share|improve this answer
    
@Gwen. Yep, this is exactly what I needed. I found the great resource PEP 292. I am currently rewriting code per this PEP and will post my solution when complete (and give you the green checkmark of course!). –  Christopher Pearson 46 mins ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.