I have written the following script that is pointed at a CSV through it's file name and then splits the file into "drops" (for mailings) and does a couple of operations on it. As of now it works, but seems fairly slow when I use it on any significant number of records (>5000). Also there is a section in there where I need to seed each file with semi-static data and the portion of the code where I store it is just ugly. If anyone can suggest improvements from a approach, style, logical, or really any perspective. What follows are the functions for seeding the file, for creating and writing the CSVs, and for transforming the rows for output. (seeding portion is in the following gist) Part of the challenge in making this is that the headers might not always be the same or in the same order, so I need some way of comparing an "ideal" to what actually is, or so I think. As of right now it is tragically uncommented, so I'll work on editing those in soon.
def create_output_csvs(source_file, start_WO, start_title, po_number)
# Prep variables for use
drop_sizes = read_drop_sizes
dealer_pin = read_dealer_pin
purl = read_dealer_purl
purl = ".#{purl}" if purl[0] != '.'
drop = 0
current_title = start_title
current_drop_number = start_WO
stop_at = drop_sizes[drop] - 1
start_at = 0
pin_seq = 100_099
ipd_head = header_to_ipd_header(source_file)
write_file = "#{current_drop_number} for import.csv"
CSV.foreach(source_file, headers: true).each_with_index do |row, i|
write_header_to(ipd_head, write_file) if i == start_at
pin_seq += 1
pin = "#{dealer_pin}-#{pin_seq}"
CSV.open(write_file, 'a') { |out| out << transform_row(row, pin, purl) }
if i == stop_at
ipd_seed(write_file, current_drop_number, pin_seq, dealer_pin, purl)
drop += 1
create_ipd_pallet_flag(current_drop_number, po_number, current_title)
if drop < drop_sizes.size
current_drop_number = next_work_order_number(current_drop_number)
current_title = next_job_title(current_title)
write_file = "#{current_drop_number} for import.csv"
pin_seq = 100_100 * (drop + 1)
pin = "#{dealer_pin}-#{pin_seq}"
start_at += drop_sizes[drop]
stop_at += drop_sizes[drop]
nav_to_next_drop_folder("#{current_drop_number} #{current_title}")
end
end
drop == drop_sizes.length ? break : true
end
end
Individual row transformations:
def transform_row(row, pin, purl)
tmphead = row.headers
lname_i = tmphead.find_index { |l| /lname.*|last.*/i=~l }
fname_i = tmphead.find_index { |l| /fname.*|first.*/i=~l }
first = row[fname_i].capitalize.gsub(/\s+/, '')
last = row[lname_i].capitalize.gsub(/\s+/, '')
sfx_i = tmphead.find_index { |l| /sfx.*|suffix.*|sufx.*/i =~ l }
mi_i = tmphead.find_index { |l| /mi.*|mname/i =~ l }
row[lname_i] = full_name(first, last, row[mi_i], row[sfx_i])
row[mi_i] = salutat(first, last, row[sfx_i])
row[fname_i] = first
full_purl = "#{first}#{last}#{purl}".downcase
address2_i = tmphead.find_index { |l| /add.*2/i =~ l }
if address2_i
address = "#{row[(address2_i - 1)]} #{row[address2_i]}"
row[address2_i - 1] = address
row.delete(address2_i)
end
row << pin
row << full_purl
row.delete_if { |h| col_blacklist?(h[0]) }
row
end
Any suggestions for improvement would be appreciated, but primarily I am interested in reworking the code so that it's more readable/maintainable as well as speeding it up in general.