I have an Excel file - a profit spreadsheet - which has one tab per product range. The tabs are defined and named, but apart from that, the file contains no data (yet). I wish to use openpyxl
to:
- One day 1, parse data from a number of other spreadsheets containing expense data. Let's imagine that each expense spreadsheet contains different products from the rest.
For each input spreadsheet, extract values from an expenses column, and subtract it from the the values in the reevnue column in a sales spreadsheet, for each product. The sales spreadsheet is a one-tab spreadsheet the contains sales values for all products for all product ranges, in no particular order. It could look something like:
When all is said and done, we end up with a value for profit, per product, per day.
Populate the profit spreadsheet with one column per tab. Each column contains the profit per product for that day. One tab per range of products. The tab names will be the same as thee corresponding expense file names.
- On day 2 and each subsequent day ad infinitum (file will be opened for appending each day), repeat steps 2 and 3, adding a column per tab each time.
The resultant profit spreadsheet should look something like this (the Cars tab is shown, but all tabs will have an identical structure):
Since the cell values in Column A of each tab will be product names and will likely remain constant, the number of rows will also likely remain constant, per tab, day on day. As such, the product names may lend themselves to the keys of dicts
, per product range (tab).
I set up some functions thus:
def parseExpenseSpreadsheet(productLineExpensesFilename):
# some standard openpyxl code to extract the values in the Revenue column
# of the sales spreadsheet
return productRangesExpensesDict # dict of product&expenses pairs
def parseSalesSpreadsheet(salesFilename):
#similar to parseExpenseSpreadsheet() but returns productRangesSalesDict dict
def calculateProfitPerProductRange(productRangesExpensesDict, productRangesSalesDict)
return y # dict of (sales minus profit per product) pairs
def appendToProfitSpreadsheet(profitSpreadsheet, expensesPerProductRange)
addNewDayData(expensesPerProductRange)
def addNewDayData(expensesPerProductRange):
for productExpensePair in expensesPerProductRange:
profit[productExpensePair] = calculateProfitPerProductRange()[productExpensePair]
# some standard openpyxl code to write the resultant
# profit dict. appendToProfitSpreadsheet() will determine which tab to write the
# dict to.
and roll it out as follows:
def main():
# some standard openpyxl code to open the profit spreadsheet for
# appending
tabs = ['Cars', 'Bikes', 'Vans']
for tab in tabs:
expensesPerProductRange = parseExpenseSpreadsheet(tab)
appendToProfitSpreadsheet(profitSpreadsheet, expensesPerProductRange)
Some of my design just doesn't smell right:
- Assigning indices of one dict to indices of another
profit[product] = calculateProfitPerProductRange()[product]
And in general it all feels a tad convoluted. How would you improve it?