Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have an Excel file - a profit spreadsheet - which has one tab per product range. The tabs are defined and named, but apart from that, the file contains no data (yet). I wish to use openpyxl to:

  1. One day 1, parse data from a number of other spreadsheets containing expense data. Let's imagine that each expense spreadsheet contains different products from the rest.
  2. For each input spreadsheet, extract values from an expenses column, and subtract it from the the values in the reevnue column in a sales spreadsheet, for each product. The sales spreadsheet is a one-tab spreadsheet the contains sales values for all products for all product ranges, in no particular order. It could look something like:

    enter image description here

    When all is said and done, we end up with a value for profit, per product, per day.

  3. Populate the profit spreadsheet with one column per tab. Each column contains the profit per product for that day. One tab per range of products. The tab names will be the same as thee corresponding expense file names.

  4. On day 2 and each subsequent day ad infinitum (file will be opened for appending each day), repeat steps 2 and 3, adding a column per tab each time.

The resultant profit spreadsheet should look something like this (the Cars tab is shown, but all tabs will have an identical structure):

enter image description here

Since the cell values in Column A of each tab will be product names and will likely remain constant, the number of rows will also likely remain constant, per tab, day on day. As such, the product names may lend themselves to the keys of dicts, per product range (tab).

I set up some functions thus:

def parseExpenseSpreadsheet(productLineExpensesFilename):
    # some standard openpyxl code to extract the values in the Revenue column 
    # of the sales spreadsheet
    return productRangesExpensesDict # dict of product&expenses pairs

def parseSalesSpreadsheet(salesFilename):
    #similar to parseExpenseSpreadsheet() but returns productRangesSalesDict dict

def calculateProfitPerProductRange(productRangesExpensesDict, productRangesSalesDict)
    return y # dict of (sales minus profit per product) pairs

def appendToProfitSpreadsheet(profitSpreadsheet, expensesPerProductRange)
    addNewDayData(expensesPerProductRange)

def addNewDayData(expensesPerProductRange):
    for productExpensePair in expensesPerProductRange:
        profit[productExpensePair] = calculateProfitPerProductRange()[productExpensePair]
    # some standard openpyxl code to write the resultant
    # profit dict. appendToProfitSpreadsheet() will determine which tab to write the  
    # dict to.

and roll it out as follows:

def main():
    # some standard openpyxl code to open the profit spreadsheet for 
    # appending
    tabs = ['Cars', 'Bikes', 'Vans']
    for tab in tabs:
        expensesPerProductRange = parseExpenseSpreadsheet(tab)
        appendToProfitSpreadsheet(profitSpreadsheet, expensesPerProductRange)

Some of my design just doesn't smell right:

  • Assigning indices of one dict to indices of another profit[product] = calculateProfitPerProductRange()[product]

And in general it all feels a tad convoluted. How would you improve it?

share|improve this question

closed as off-topic by jonrsharpe, Quill, Snowhawk04, RobH, ferada Jul 2 at 8:49

This question appears to be off-topic. The users who voted to close gave this specific reason:

If this question can be reworded to fit the rules in the help center, please edit the question.