I solved this problem using closures with Perl and I wonder if/how I could/should have done it differently?
Background:
User creates config file describing properties of some source data file that would be processed:
- Type of file delimited (and the delim) or fixed width
- How many fields to expect
- Which fields are to be pulled out
Examples on the last (if delimited):
Fields: FIRST_NAME 2
Fields: LAST_NAME 3
Fields: ADDRESS 4
or (fixed width):
Fields: FIRST_NAME 58,30
Fields: LAST_NAME 88,30
Fields: ADDRESS 118,50
My Perl reads the config, and prepares ahead the action to take once I start reading the source data file.
Internally, I'm creating preparing my master hash (per the config), which I'll loop on for each record when reading the source.
Now to handle the said examples, I have this code, if the source file would be delimited:
sub ret_col_sub
{
my ($col_name, $col_pos) = @_
warn qq($col_name col_pos='$col_pos' is not a number) if $col_pos =~ /\D/;
$col_pos--;
# Return closure to caller, which will be called later when we're reading the source data file
return sub
{
my $rec = shift;
return $rec->[$col_pos];
};
}
or fixed width:
sub ret_substr_sub
{
my ($col_name, $substr_params) = @_;
my ($offset, $len) = $substr_params =~ /(\d+)\s*,\s*(\d+)/;
# The offset might be ZERO, check for length ... but $len must always be P O S I T I V E !
warn qq($col_name substr_params='$substr_params' is missing the offset and length needed by substr) if not length $offset or not $len;
$offset--;
$len--;
# Return closure to caller, which will be called later when we're reading the source data file
return sub
{
my $rec = shift;
return substr $rec, $offset, $len;
};
}
Is there a better or another way of doing this, or any other suggestions?