Split excel file with multiple sheets, manipulate the data and create final out file

Question

I have an excel file with 20+ separate sheets containing tables of data. My script iterates through each sheet, manipulates the data into the format I want it and then saves it to a final output file. I think it can be improved, and I've flagged sections in the code with "review this" which I think I've done more work than I've needed to. Any feedback or criticism would be awesome!

import openpyxl
import pandas as pd

path = 'C:/Desktop/Python/Excel Manipulation/'

wb = openpyxl.load_workbook(path + 'inputfile.xlsx')
sheets = wb.get_sheet_names()

CSVList = []

for sheet in sheets:

    #get the current active sheet
    active_sheet = wb.get_sheet_by_name(sheet)

    #count numbers of rows
    row_count = active_sheet.get_highest_row() - 1
    #count number of columns
    column_count = active_sheet.get_highest_column()

    count = 0
    values = []


    #write each row to a list, stop when reached max rows (REVIEW THIS - would have thought there was a better way than using a counter)
    while count <= row_count:               
        for i in active_sheet.rows[count]:
            values.append(i.value)      
        count = count + 1

    #split values list into tuples based on number of columns 
    split_rows = zip(*[iter(values)]*column_count)

    #convert list of tuples to list of lists  (REVIEW THIS - creating a tuple and then converting to list seems like extra work?!?)
    rows = [list(elem) for elem in split_rows]

    #get elements of file and store (REVIEW THIS - looks messy?)
    title = rows.pop(0)[0]
    headers = rows.pop(0)
    headers[1] = 'Last Year'
    rows.pop(0) 

    #create pandas dataframe
    df = pd.DataFrame(rows, columns=headers)

    #take header_id and remove to normalise the data
    header_id = headers.pop(2)


    normalise_data = pd.melt(df, id_vars=header_id, value_vars=headers, var_name='Measure', value_name='Value') 
    normalise_data.insert(0, 'Subject', title)  
    CSVList.append(normalise_data)


frame = pd.concat(CSVList)
frame.to_csv(path + 'CSV Outputs/' + 'final.csv', sep=',', index=False)

ashleh · Answer 1 · 2015-05-14 15:46:35Z

up vote 1 down vote

Found a better way to iterate over the rows (although I still feel like I'm repeating myself!):

for idx, row in enumerate(active_sheet.rows):               
    for i in active_sheet.rows[idx]:
        values.append(i.value)

And instead of converting the tuple to a list:

rows = [list(t) for t in zip(*[iter(values)]*column_count)]

edited May 14 '15 at 15:46

answered May 14 '15 at 15:33

ashleh

7017

1

Why not for i in row on your second for loop there? – SuperBiasedMan Aug 28 '15 at 16:31

add a comment |

asked	1 year ago
viewed	1451 times
active	8 months ago

current community

your communities

more stack exchange communities

Split excel file with multiple sheets, manipulate the data and create final out file

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python csv excel pandas or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Split excel file with multiple sheets, manipulate the data and create final out file

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python csv excel pandas or ask your own question.

Related

Hot Network Questions