Hi I would like to extract strings from input file like the below:
>a11
UCUUUGGUUAUCUAGCUGUAUGA
>a11
UCUUUGGUUAUCUAGCUGUAUGA
>b22
UGGUCGACCAGUUGGAAAGUAAU
>b22
ACUUCACCUGGUCCACUAGCCGU
>b22
AGGUUGUCUGUGAUGAGUUCG
>t33
UUAAUGCUAAUCGUGAUAGGGGU
>t33
CAGUAACAAAGAUUCAUCCUUGU
The line starts with ">" is a header and the line below is a sequence.
I would like to extract the sequences with header only strats with ">b22"
This is my code which do not give the properl answer.
def extractData():
filename = ("data.txt")
infile = open(filename,'r')
for x in infile.readlines():
x = x.strip()
if x.startswith(">"):
header = x
else:
sequence = x
if header.startswith(">b22"):
print(header, sequence)
infile.close()
extractData()
It gives result like this:
>b22 UCUUUGGUUAUCUAGCUGUAUGA
>b22 UGGUCGACCAGUUGGAAAGUAAU
>b22 UGGUCGACCAGUUGGAAAGUAAU
>b22 ACUUCACCUGGUCCACUAGCCGU
>b22 ACUUCACCUGGUCCACUAGCCGU
>b22 AGGUUGUCUGUGAUGAGUUCG
But, my expected result is like this:
>b22 UGGUCGACCAGUUGGAAAGUAAU
>b22 ACUUCACCUGGUCCACUAGCCGU
>b22 AGGUUGUCUGUGAUGAGUUCG
Can somebody fix this please? What is the matter and what should I imply to get the correct result?
Thank you in advacne.