Sign up ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

My input file looks like:

#key    string              pos(string)     
key1    AA000AA000000AAA0A  2, 3, 18, 12    
key2    00A00AAA000AAAA00A  3, 18           

And I'd like to add a new column at the end of each key line (tab-separated). If an A is found in column 2 of the input file, the new column would contain the positions given in column 3 of the input file. If a 0 is found in column 2 of the input file, the position should not be printed in the new column.

Basically, this is the wished output:

#key    string              pos(string)     Apos(string)
key1    AA000AA000000AAA0A  2, 3, 18, 12    2, 18
key2    00A00AAA000AAAA00A  3, 18           3, 18

Short explanation:
(key1)

  • string at index 2 has an A -> item 2 added to the new column
  • string at index 3 has a 0 -> item 3 not added to the new column
  • string at index 12 has a 0 -> item 12 not added to the new column
  • string at index 18 has an A -> item 18 added to the new column

I'm doing this in python, but I get stuck in a multiple for over keys and items (the strings are quite long to process), so I thought I could ask for your advices for a command line (lighter) solution.

What I was thinking of is:

  • split the pos(string) field, get the index that I search for in the string field
  • get the character at given index in the string
  • for statement(?)
share|improve this question
    
I'm having a hard time trying to understand what you want especially when both records have 0s, yet their new column is different. Add more records in your example or more details or perhaps even your Python code. –  Cristian Ciupitu Aug 1 '14 at 19:48
    
The new column is the subset of column 3 for which there's an A at the specified position. –  Barmar Aug 2 '14 at 7:36

3 Answers 3

up vote 1 down vote accepted

How about the following awk script:

#!/usr/bin/awk -f
BEGIN {
        FS="\t"
        print "#key\tstring\tpos(string)\tApos(string)"
}

{
        out=""
        printf "%s\t",$0
        split($2,str,"")
        gsub(/ /,"",$3)
        split($3,pos,",")
        for (i in pos){
                if (str[pos[i]]=="A"){
                        out = out pos[i] ", "
                }
        }
        gsub(/, $/,"",out)
        print out
}

Save it as (for example) findA.awk and make it executable with chmod +x findA.awk.

Then run it against your input data and redirect the output to a new file:

./findA.awk input.txt > output.txt
cat output.txt
#key    string  pos(string) Apos(string)
key1    AA000AA000000AAA0A  2, 3, 18, 12    2, 18
key2    00A00AAA000AAAA00A  3, 18   3, 18

The output isn't as tidy as your example as it is tab delimited (as per your request) and the tab width doesn't align with the width of the various strings.

share|improve this answer

I'm not sure how you're doing it now (it would be helpful to see your Python code), but you can create a list of the elements of column 3 which point to an 'A' in column 2 like so:

[i for i in COLUMN3 if COLUMN2[i]=='A']

This seems like a simple problem, but maybe I don't fully understand it. Perhaps you are forgetting that strings are iterables?

share|improve this answer

An awful perl:

$ perl -anle '
    printf "%s    Apos(string)\n",$_ and next if /^#/;
    printf "%s",$_;
    $len = 12 - length((split(/\s+/,$_,3))[-1]);
    for $pos_ss (@F[2..$#F]) {
        $char = substr($F[1],int($pos_ss)-1,1);
        push @res, int($pos_ss) if $char eq 'A';
    }
    printf "%@{[12-4+$len]}s\n", join ", ",@res;
    @res=();
' file
#key    string              pos(string)    Apos(string)
key1    AA000AA000000AAA0A  2, 3, 18, 12   2, 18
key2    00A00AAA000AAAA00A  3, 18          3, 18

It works similar with my solution for this answer, plus adding $len variable to calculate the format need to print the last column aligned.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.