Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I want to optimize a Perl function which is frequently used in my application. The function creates a special datastructure from the results of DBI::fetchall_arrayref which looks like:

$columns = ['COLNAME_1','COLNAME_2','COLNAME_3']
$rows    = [ ['row_1_col_1', 'row_1_col_2', 'row_1_col_3'],
             ['row_2_col_1', 'row_2_col_2', 'row_2_col_3'],
             ['row_3_col_1', 'row_3_col_2', 'row_3_col_3']
];

The new datastructure must contain the data in the following form (all row-values for every column in a single arrayref)

$retval = { 
    row_count => 3,
    col_count => 3,
    COLNAME_1 => ['row_1_col_1', 'row_2_col_1', 'row_3_col_1' ],
    COLNAME_2 => ['row_1_col_2', 'row_2_col_2', 'row_3_col_2' ],
    COLNAME_3 => ['row_1_col_3', 'row_2_col_3', 'row_3_col_3' ]
}

The new datastructure is a Hash of Arrays and is used in the whole application. I cannot change the format (its too frequently used). I wrote a function for this conversion. I've already done some some performance optimization after profiling my application. But it's not enough. Now the function looks like:

sub reorganize($$) {
    my ($self,$columns,$rows) = @_;
    my $col_count = scalar(@$columns);
    my $row_count = scalar(@$rows);
    my $col_index = 0;
    my $row_index = 0;
    my $retval = {  # new datastructure
        row_count   => $row_count,
        col_count   => $col_count    
    };

    # iterate through all columns
    for($col_index=0; $col_index<$col_count; $col_index++) {
        # create a arrayref for all row-values of the current column
        # set it to the correct size and assign all values to this arrayref
        my $tmp = [];
        $#{$tmp} = $row_count-1;    # set size of array to the number of rows

        # iterate through all rows
        for($row_index=0; $row_index<$row_count; $row_index++) {      
            # assign values to arrayref (which has the correct size) instead of a "slow" push
            $tmp->[$row_index] = $rows->[$row_index][$col_index];            
        }
        # Assign the arrayref to the hash. The hash-key is the name of the column
        $retval->{$columns->[$col_index]} = $tmp;
    }
    return $retval;
}

My Question:

Is there a way to further optimize this function (maybe using $[...])? I found some hints here at page 18 and 19, but I don't have any experience in using $ in different contexts.

I have to say that the function listed above is the best I can do. There may be other ways to do some optimization which I have never heard of.

share|improve this question
    
Just a note: It seems ($self) you are using the subroutine as a method. You can remove the prototypes ($$) as they are ignored in method calls anyway. –  choroba Mar 11 '14 at 12:12

1 Answer 1

The following code is about 35% faster (measured with Benchmark). The tricks:

  • no anonymous array created for $tmp.

  • explicit return removed.

  • variables created in place where their value is needed.

Some of the tricks added just a 3%, the first one seemed the most important. YMMV.

I experimented with $_ and maps, too, but it seems the plain old C-style loop is the fastest.

sub faster {
    my ($self, $columns, $rows) = @_;
    my $retval = {
        row_count   => my $row_count = @$rows,
        col_count   => my $col_count = @$columns,
    };
    for (my $col_index = 0 ; $col_index < $col_count ; $col_index++) {
        my $tmp;
        for (my $row_index = 0 ; $row_index < $row_count ; $row_index++) {
            $tmp->[$row_index] = $rows->[$row_index][$col_index];
        }
        $retval->{$columns->[$col_index]} = $tmp;
    }
    $retval
}
share|improve this answer
    
Thanks you very much. The first and second tricks are really interesting and from a c-style point of view very strange. i will try it with NYTProf asap. –  some_coder Mar 11 '14 at 12:45
    
@some_coder: Forget the C-style point of view when optimizing Perl :-) –  choroba Mar 11 '14 at 12:47
    
In my benchmark (yay!) I've observed foreach loop was faster (and, what's even more important, more readable), so please change C-style fors to for my $col_index (0 .. $col_count - 1). –  Xaerxess Mar 11 '14 at 13:24
    
@Xaerxess: In my Benchmark, switching to this style loop was slower. –  choroba Mar 11 '14 at 13:26
    
@choroba I guess it depends on how many iterations you're doing - see this gist, for-c outperforms foreach only up to 4-8 elements to iterate over. Still, in terms of readability, foreach wins, and that said I wish I'd never have to optimize fors. –  Xaerxess Mar 11 '14 at 13:55

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.