Sign up ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

I have a pretty weird problem. I did some calculations for bioinformatics, but my downstream apps won't accept my header information in its current format. In order to circumvent this I modified my fasta header information. To clarify:

The original header looks like this: ">abc1"

The modified header looks like this: ">abc1|abc1"

Now there's 1.3 million header files which were modified, all of them follow the "abc" pattern, with the number after it designating the contig number.

My calculation files are mostly tab separated values, but they contain the older header information. Is there any way I can use awk or sed or something similar to replace all occurrences of "abc1" with "abc1|abc1" automatically for all the 1.3 million occurrences? obviously all abc2's would be abc2|abc2, and so on.

Redoing the calculations w/ the modified header information would take quite a long time, so I really don't want to redo the work just because the header information changed.

share|improve this question
1  
What is a "fasta header" and what does it have to do with your "weird problem"? Please be so polite and read help→tour, which tells you among other things, to leave out chit-chat (like the thanks I already removed) – Anthon Sep 12 at 5:55
2  
Please edit your question and show us a sample of your input files. Are these actual fasta sequences? Is the header always on a single line? We can't help you unless you show us the format of the input file. – terdon Sep 12 at 10:08

1 Answer 1

Using sed like this?

sed -r -e 's/^>(abc[0-9]+)/>\1|\1/g' input.txt > output.txt

You'd better show some of actual input and expected output to get more accurate answers.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.