The use of IT to analyze biological data.
6
votes
2answers
66 views
Foreach-loop for and print commands
How can I make the following code shorter or efficient (maybe with other loops or other nice ideas), and keep the current functionality?
my $examine = shift; my $detect = 0;
foreach my $ProteinDB ...
5
votes
2answers
61 views
Genomic Range Query
Recently I worked on one of the Codility Training - Genomic Range Query (please refer to one of the evaluation report for the detail of this training).
The proper approach for this question is using ...
10
votes
2answers
76 views
Calculating the joint probability of n events from a sample sequence of occurrences
I'm writing an algorithm to take in a sample list of sequences of events, calculate 1-step transitional probabilities from the sequences, forward or in reverse, then calculate the joint probability of ...
4
votes
2answers
153 views
How to improve this Needleman-Wunsch implementation in C#?
I split my implementation of this sequence alignment algorithm in three methods. Where NeedlemanWunsch-method makes use of the ScoringFunction and the Traceback methods. Further I decided to go with ...
5
votes
3answers
157 views
Code optimization for SQLite result set parsing
I am retrieving information from an SQLite database that gives me back around 20 million rows that I need to process. This information is then transformed into a dict of lists which I need to use. I ...
4
votes
1answer
64 views
Cutting strings into smaller ones based on specific criteria
So, I've got this largish (for me) script, and I want to see if anybody could tell me if there are any ways to improve it, both in terms of speed, amount of code and the quality of the code. I still ...
6
votes
1answer
101 views
Calculate query coverage from BLAST output
I have a BLAST output file and want to calculate query coverage, appending the query lengths as an additional column to the output. Let's say I have
2 7 15
f=open('file.txt', 'r')
...
5
votes
5answers
288 views
Bioinformatics: Genome string clump finding problem
I am trying to solve a bioinformatics problems from a Stepic course.
The problem posed: find clumps of the same pattern within a longer genome.
Motivation: Identifying 3 occurrences of the same ...
2
votes
3answers
335 views
Longest DNA sequence that appears at least twice (only one DNA string as input)
My question is to find the longest DNA sub-sequence that appears at least twice. The input is only one DNA string, NOT TWO strings as other LCS programs.
I have done my 4th program and it seems to be ...
5
votes
1answer
232 views
Performance: equivalent C and C++ programs
I write quite a bit of code in C, but haven't done much C++ since my college CS classes. I have been revisiting C++ recently, and thought I would re-implement a program I had previously written in C, ...
4
votes
2answers
122 views
Efficient parsing of FASTQ
FASTQ is a notoriously bad format. This is because it uses the same @ character for the id line as it does for quality scores. Deciding what is a quality score and what is an id is a tricky endeavor ...
11
votes
1answer
869 views
Simple DNA sequence finder w/ mismatch tolerance
The goal with this function is to find one DNA sequence within another sequence, with a specified amount of mismatch tolerance. For example:
dnasearch('acc','ccc',0) shouldn't find a match, while
...
2
votes
2answers
130 views
Feedback on text parsing and control structures
I threw together this C program today to handle a bioinformatics data processing task. The program seems to work correctly, but I wanted to know if anyone has suggestions regarding how the input data ...