Foreach-loop for and print commands

Question

How can I make the following code shorter or efficient (maybe with other loops or other nice ideas), and keep the current functionality?

my $examine = shift; my $detect = 0;
foreach my $ProteinDB (@DB) {
    my $Set =  $ProteinDB->{'ID'};
    if($Set =~ /$examine/) {
        $detect = 1;
        my @Pointer = keys(%$ProteinDB);
        foreach my $key (@Pointer) {
            if($key ne "ID" && $key ne "SQ") {
                print "$key " . $ProteinDB->{$key} . "\n";
                print "\n";
            }
        }
        print "SQ  " . $ProteinDB->{"SQ"} . "\n";
        print "//\n";
    }
}
if($detect == 0) {
    print "Error : For the ID $examine NO-Entry could be Found !\n";
}

It's nice code and easy to understand, but I don't have that experience that I can tell what I can do more with this code. I am gonna keep trying to change it until it's shorter.

the Code is fast enough but its too long thats my problem i am trying to use another things(loops, regex,.. to make it shorter, but have the same functionality — user3075044, Mar 31 at 14:46

mpapec · Accepted Answer · 2014-03-31 18:33:36Z

This is shorter, but not necessarily more efficient,

my $examine = shift; my $detect = 0;
for my $ProteinDB (@DB) {
    next if $ProteinDB->{'ID'} !~ /$examine/;
    $detect = 1;
    print "$_ $ProteinDB->{$_}\n\n"
      for grep $_ ne "ID" && $_ ne "SQ", keys %$ProteinDB;
    print "SQ  $ProteinDB->{SQ}\n//\n";
}
$detect or print "Error : For the ID $examine NO-Entry could be Found !\n";

and if you push it little more,

my $examine = shift; my $detect = 0;
print map {
    $detect = 1;
    my $ProteinDB = $_;
    (
      (map "$_ $ProteinDB->{$_}\n\n", grep $_ ne "ID" && $_ ne "SQ", keys %$ProteinDB),
      "SQ  $ProteinDB->{SQ}\n//\n"
    );
  }
  grep $_->{ID} =~ /$examine/, @DB;
$detect or print "Error : For the ID $examine NO-Entry could be Found !\n";

amon · Answer 2 · 2014-03-31 16:05:42Z

This is nice, well-formatted code. Let's see what we can improve:

Only use one statement per line:
```
my $examine = shift;
my $detect = 0;
```
I assume this code is from the body of a subroutine. I'd recommend not using shift to unpack arguments, but rather list assignment. This at least looks like a formal parameter list:
```
sub foo {
    my ($examine) = @_;
    ...
}
```
Otherwise, you can also use syntax extensions like signatures: sub ($examine) { ... }.
Where does @DB come from? Quite likely, a reference to that array should be passed as one parameter.
The Perl community has no single unifying coding style, but there are a few conventions that are shared by a large part of that community. For example, variables should use $snake_case, and not $CamelCase. But whatever you choose, just be consistent. Either all variables start with an uppercase letter, or none.
I'd rather use the shorter for instead of foreach. The two keywords are absolutely interchangeable.
Builtin operators like keys are commonly used without parens around the arguments: keys %$ProteinDB instead of keys(%$ProteinDB)

Don't introduce unnecessary variables like $Set or @Pointer.

For example,

my $Set =  $ProteinDB->{'ID'};
if($Set =~ /$examine/) {

becomes

if($ProteinDB->{'ID'} =~ /$examine/) {

and this

my @Pointer = keys(%$ProteinDB);
foreach my $key (@Pointer) {

becomes

for my $key (keys %$ProteinDB) {

There is often no need to concatenate strings when you can directly interpolate:
```
print "$key " . $ProteinDB->{$key} . "\n";
```
becomes
```
print "$key $ProteinDB->{$key}\n";
```
Starting with Perl 5.10, you can use the say feature. The say function behaves exactly like print except that it always appends a newline.
A construct like for my $item (@list) { if (condition($item)) { ... } } can be simplified using the grep builtin:
```
for my $item (grep { condition($_) } @list) {
    ....
}
```
Alternatively, you can quickly jump to the next loop iteration if the condition fails, which avoids a level of indentation:
```
for my $item (@list) {
    next if not condition($item);
    ...
}
```
Your $detect variable just has a boolean value. In that case, only test whether it's truthy or falsey, not whether it's equal to 0:
```
if (not $detect) {
```
The wording of your error message isn't exactly clear.

Wrapping all that together, I'd rewrite your code to this:

use feature 'say';

sub foo {
    my ($id_regex, $db) = @_;

    my $detected = 0;
    for my $protein (@$db) {
        next if not $protein->{ID} =~ /$id_regex/;

        $detected = 1;

        for my $key (keys %$protein) {
            next if $key eq 'ID' or $key eq 'SQ';

            say "$key $protein->{$key}";
            say "";
        }

        say "SQ  $protein->{SQ}";
        say "//";
    }

    if (not $detected) {
        say "Error: No entry for ID =~ /$id_regex/ could be found!";
    }
}

Notice the strategically placed empty lines to emphasize what belongs together, and what is unrelated.

Further observations, not all of which will be applicable to your case:

You are testing that $protein->{ID} matches some regex. For example, when you have the IDs 1, 42, and 123, then asking for 1 would return both 1 and 123. The way you've written it you could also ask for matches of arbitrary regexes, e.g. foo(qr/\A1.*1\z/, \@DB) (any ID that starts and ends with a 1). Is this what you intendted? Or do you only want those proteins where the ID is actually equal? Then use the eq string comparision instead of a regex match.
If that is the case, you could also index your proteins in a hash. That takes as much time as looping through all elements, but once you are done a lookup is very fast. Do this if you make multiple lookups:
```
my %proteins_by_id;
for my $protein (@DB) {
      $proteins_by_id{$protein->{ID}} = $protein;
}
```
or abbreviated:
```
my %proteins_by_id;
@proteins_by_id{ map { $_->{ID} } @DB } = @DB;
```
which uses a hash slice @hash{@keys}, and the map functions which is like a foreach but returns a list of results.

Then: my $detected = exists $proteins_by_id{$expected_id}

This of course only works if each ID is unique, otherwise we need a hash of arrays:
```
my %proteins_by_id;
for my $protein (@DB) {
      push @{ $proteins_by_id{$protein->{ID}} }, $protein;
}
```
When looping over all keys, they will be in an unspecified order. If you want the output to stay the same between runs, you must sort them:
```
for my $key (sort keys %$protein) {
```
When looping over all keys, you skip the SQ entry but then print it out anyway. Is this really necessary? Or do you just want to make sure that it's printed last? If there is no real need, remove this.
You simply print your error. But unless this is some simple one-off script, it is usually better to return a value indicating whether your subroutine has completed successfully. Even better: die with an error. This will abort normal control flow of your script. The error will be passed upwards through the call stack. If it isn't handled, it will terminate the script. Exceptions are often caught via eval { ... }, but the Try::Tiny module is better:
```
try {
    ... # something that can `die`
} catch {
    warn "I caught: $_";  # do something with the error
};  # this ';' is not optional
```
It is considered a bad practice to print out stuff from a subroutine that actually does something different (like searching for matching entries). If it has to be, return a string that the caller can print out. This is important in larger projects where you need to write tests that check that your program behaves correctly – it's easier to check a return value than to capture everything you print.

Assuming I've read your mind correctly, then this code would be better:

use strict;
use warnings;
use feature 'say';
use Try::Tiny;

# index the @DB by ID:
my %proteins_by_id;
for my $protein (@DB) {
      push @{ $proteins_by_id{$protein->{ID}} }, $protein;
}

for my $id (1234, 5678) {
    # print out the report or the error message
    say try { report_matching_id($id, \%proteins_by_id) } catch { "Error: $_" };
}

sub report_matching_id {
    my ($id, $protein_db) = @_;
    my $matching_proteins = $protein_db->{$id};

    # return early if we can see that there are no matches
    # (either the entry doesn't exist, or the array is empty)
    if (not $matching_proteins or not @$matching_proteins) {
        die "No matching proteins for ID=$id found";
    }

    my $output = "";
    for my $protein (@$matching_proteins) {
        # dump each %$protein
        for my $key (sort keys %$protein) {
            next if $key eq 'ID';
            $output .= "$key $protein->{$key}\n\n";
        }
        $output .= "//\n\n";
    }

    return $output;
}

note that snake_case is literally shorter than CamelCase :) (though often wider) — ysth, Mar 31 at 16:24

asked	3 months ago
viewed	79 times
active	3 months ago

current community

your communities

more stack exchange communities

Foreach-loop for and print commands

migrated from stackoverflow.com Mar 31 at 14:54

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged loop perl bioinformatics or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Foreach-loop for and print commands

migrated from stackoverflow.com Mar 31 at 14:54

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged loop perl bioinformatics or ask your own question.

Related

Hot Network Questions