0

I have an input file that looks like this:

>Seq_1;1
AAAAAAAAAAAAAAAAAAAAA
>Seq_2;1
CCCCCCCCCCCCCCCCCCCCC

And there are many more pairwise line like that. What I want to do is to simply print it out like this:

>Seq_1;1 AAAAAAAAAAAAAAAAAAAAA
>Seq_2;1 CCCCCCCCCCCCCCCCCCCCC

But why this code fail:

#!/usr/bin/perl -w

   while ( <> ) {
        chomp;
        my $line = $_;
        my $rdn = "";
        my $sq  = "";

        if ( $line =~ /^>/ ) {
            $rdn = $line;
        }
        elsif ($line =~ /^[ATCG]/) {
            $sq = $line;
        }

         print "$rdn $sq\n";

    }

It print this instead:

>Seq_1;1
 AAAAAAAAAAAAAAAAAAAAA
>Seq_2;1
 CCCCCCCCCCCCCCCCCCCCC
1
  • This doesn't answer your question; but you may find it helpful: xargs -n 2 < file.fa
    – Steve
    Commented Jun 3, 2013 at 10:04

4 Answers 4

2

Since your data is 'pairwise', and unless you want to explicitly check each line for the patterns you describe, why not just read two lines at a time? Then do your processing:

#!/usr/bin/perl

use strict;
use warnings;


while (my $line1 = <>) {
    my $line2 = <>;

    chomp $line1;
    chomp $line2;

#   ...do_something...

    print "$line1 $line2\n";
}

Results:

>Seq_1;1 AAAAAAAAAAAAAAAAAAAAA
>Seq_2;1 CCCCCCCCCCCCCCCCCCCCC
1

This awk could make it:

$ awk '/^>/ {getline a; print $0,a}' file

it loads in the variable a the next line of the ones starting with >. Then prints both together.

Test

$ cat file
>Seq_1;1
AAAAAAAAAAAAAAAAAAAAA
>Seq_2;1
CCCCCCCCCCCCCCCCCCCCC
$ awk '/^>/ {getline a; print $0,a}' file
>Seq_1;1 AAAAAAAAAAAAAAAAAAAAA
>Seq_2;1 CCCCCCCCCCCCCCCCCCCCC
2
  • Use getline() only when necessary. It's not necessary here. There are safer/better ways to do it.
    – Steve
    Commented Jun 3, 2013 at 10:04
  • Thank you, @SuicidalSteve! I will take a look at it later on and try to update my answer with a safer method.
    – fedorqui
    Commented Jun 3, 2013 at 10:15
1

Because you're reading the file line by line and printing on every line? You probably want something more like this. Storing the value of the first line in the first iteration, then printing it out on the second interation. Note this code is by no means the best, as if your file isn't exactly like you've posted above it will almost certainly print the incorrect thing.

#!/usr/bin/perl -w

   my $rdn = "";

   while ( <> ) {
        chomp;
        my $line = $_;    
        if ( $line =~ /^>/ ) {
            $rdn = $line;
            next;
        }
        elsif ($line =~ /^[ATCG]/) {
            my $sq = $line;
            print "$rdn $sq\n";
        }    
    }
1
  • The next in the then-part of the if is not needed as the only other code in the loop is an elsif clause. Alternatively, keep the next and change the elsif to an if.
    – AdrianHHH
    Commented Jun 3, 2013 at 10:16
1

In each iteration, you set either $rdn or $sq, never both. Then you print both of them (one of them always "") with a \n at the end.

Try this, the idea being to only chomp off the \n if it's an even-numbered line, in that case printing a space instead:

my $lineno = 0;
while (<>) {
   if ($lineno % 2 == 0) {
      chomp;
      print $_, " ";
   } else {
      print;
   }
   $lineno++;
}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.