Code Review Stack Exchange is a question and answer site for peer programmer code reviews. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have an excel file with 20+ separate sheets containing tables of data. My script iterates through each sheet, manipulates the data into the format I want it and then saves it to a final output file. I think it can be improved, and I've flagged sections in the code with "review this" which I think I've done more work than I've needed to. Any feedback or criticism would be awesome!

import openpyxl
import pandas as pd

path = 'C:/Desktop/Python/Excel Manipulation/'

wb = openpyxl.load_workbook(path + 'inputfile.xlsx')
sheets = wb.get_sheet_names()

CSVList = []

for sheet in sheets:

    #get the current active sheet
    active_sheet = wb.get_sheet_by_name(sheet)

    #count numbers of rows
    row_count = active_sheet.get_highest_row() - 1
    #count number of columns
    column_count = active_sheet.get_highest_column()

    count = 0
    values = []


    #write each row to a list, stop when reached max rows (REVIEW THIS - would have thought there was a better way than using a counter)
    while count <= row_count:               
        for i in active_sheet.rows[count]:
            values.append(i.value)      
        count = count + 1

    #split values list into tuples based on number of columns 
    split_rows = zip(*[iter(values)]*column_count)

    #convert list of tuples to list of lists  (REVIEW THIS - creating a tuple and then converting to list seems like extra work?!?)
    rows = [list(elem) for elem in split_rows]

    #get elements of file and store (REVIEW THIS - looks messy?)
    title = rows.pop(0)[0]
    headers = rows.pop(0)
    headers[1] = 'Last Year'
    rows.pop(0) 

    #create pandas dataframe
    df = pd.DataFrame(rows, columns=headers)

    #take header_id and remove to normalise the data
    header_id = headers.pop(2)


    normalise_data = pd.melt(df, id_vars=header_id, value_vars=headers, var_name='Measure', value_name='Value') 
    normalise_data.insert(0, 'Subject', title)  
    CSVList.append(normalise_data)


frame = pd.concat(CSVList)
frame.to_csv(path + 'CSV Outputs/' + 'final.csv', sep=',', index=False)
share|improve this question

Found a better way to iterate over the rows (although I still feel like I'm repeating myself!):

for idx, row in enumerate(active_sheet.rows):               
    for i in active_sheet.rows[idx]:
        values.append(i.value)

And instead of converting the tuple to a list:

rows = [list(t) for t in zip(*[iter(values)]*column_count)]
share|improve this answer
1  
Why not for i in row on your second for loop there? – SuperBiasedMan Aug 28 '15 at 16:31

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.