Performance optimization in function for datastructure mapping

Question

I want to optimize a Perl function which is frequently used in my application. The function creates a special datastructure from the results of DBI::fetchall_arrayref which looks like:

$columns = ['COLNAME_1','COLNAME_2','COLNAME_3']
$rows    = [ ['row_1_col_1', 'row_1_col_2', 'row_1_col_3'],
             ['row_2_col_1', 'row_2_col_2', 'row_2_col_3'],
             ['row_3_col_1', 'row_3_col_2', 'row_3_col_3']
];

The new datastructure must contain the data in the following form (all row-values for every column in a single arrayref)

$retval = { 
    row_count => 3,
    col_count => 3,
    COLNAME_1 => ['row_1_col_1', 'row_2_col_1', 'row_3_col_1' ],
    COLNAME_2 => ['row_1_col_2', 'row_2_col_2', 'row_3_col_2' ],
    COLNAME_3 => ['row_1_col_3', 'row_2_col_3', 'row_3_col_3' ]
}

The new datastructure is a Hash of Arrays and is used in the whole application. I cannot change the format (its too frequently used). I wrote a function for this conversion. I've already done some some performance optimization after profiling my application. But it's not enough. Now the function looks like:

sub reorganize($$) {
    my ($self,$columns,$rows) = @_;
    my $col_count = scalar(@$columns);
    my $row_count = scalar(@$rows);
    my $col_index = 0;
    my $row_index = 0;
    my $retval = {  # new datastructure
        row_count   => $row_count,
        col_count   => $col_count    
    };

    # iterate through all columns
    for($col_index=0; $col_index<$col_count; $col_index++) {
        # create a arrayref for all row-values of the current column
        # set it to the correct size and assign all values to this arrayref
        my $tmp = [];
        $#{$tmp} = $row_count-1;    # set size of array to the number of rows

        # iterate through all rows
        for($row_index=0; $row_index<$row_count; $row_index++) {      
            # assign values to arrayref (which has the correct size) instead of a "slow" push
            $tmp->[$row_index] = $rows->[$row_index][$col_index];            
        }
        # Assign the arrayref to the hash. The hash-key is the name of the column
        $retval->{$columns->[$col_index]} = $tmp;
    }
    return $retval;
}

My Question:

Is there a way to further optimize this function (maybe using $[...])? I found some hints here at page 18 and 19, but I don't have any experience in using $ in different contexts.

I have to say that the function listed above is the best I can do. There may be other ways to do some optimization which I have never heard of.

Just a note: It seems ($self) you are using the subroutine as a method. You can remove the prototypes ($$) as they are ignored in method calls anyway. — choroba, Mar 11 '14 at 12:12

choroba · Answer 1 · 2014-03-11 12:35:19Z

up vote 2 down vote

The following code is about 35% faster (measured with Benchmark). The tricks:

no anonymous array created for $tmp.
explicit return removed.
variables created in place where their value is needed.

Some of the tricks added just a 3%, the first one seemed the most important. YMMV.

I experimented with $_ and maps, too, but it seems the plain old C-style loop is the fastest.

sub faster {
    my ($self, $columns, $rows) = @_;
    my $retval = {
        row_count   => my $row_count = @$rows,
        col_count   => my $col_count = @$columns,
    };
    for (my $col_index = 0 ; $col_index < $col_count ; $col_index++) {
        my $tmp;
        for (my $row_index = 0 ; $row_index < $row_count ; $row_index++) {
            $tmp->[$row_index] = $rows->[$row_index][$col_index];
        }
        $retval->{$columns->[$col_index]} = $tmp;
    }
    $retval
}

answered Mar 11 '14 at 12:35

choroba
20616

Thanks you very much. The first and second tricks are really interesting and from a c-style point of view very strange. i will try it with NYTProf asap. – some_coder Mar 11 '14 at 12:45

@some_coder: Forget the C-style point of view when optimizing Perl :-) – choroba Mar 11 '14 at 12:47

In my benchmark (yay!) I've observed foreach loop was faster (and, what's even more important, more readable), so please change C-style fors to for my $col_index (0 .. $col_count - 1). – Xaerxess Mar 11 '14 at 13:24

@Xaerxess: In my Benchmark, switching to this style loop was slower. – choroba Mar 11 '14 at 13:26

@choroba I guess it depends on how many iterations you're doing - see this gist, for-c outperforms foreach only up to 4-8 elements to iterate over. Still, in terms of readability, foreach wins, and that said I wish I'd never have to optimize fors. – Xaerxess Mar 11 '14 at 13:55

add a comment |

asked	1 year ago
viewed	78 times
active	1 year ago

current community

your communities

more stack exchange communities

Performance optimization in function for datastructure mapping

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged performance perl or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Performance optimization in function for datastructure mapping

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged performance perl or ask your own question.

Related

Hot Network Questions