The use of IT to analyze biological data.

learn more… | top users | synonyms

6
votes
2answers
66 views

Foreach-loop for and print commands

How can I make the following code shorter or efficient (maybe with other loops or other nice ideas), and keep the current functionality? my $examine = shift; my $detect = 0; foreach my $ProteinDB ...
5
votes
2answers
61 views

Genomic Range Query

Recently I worked on one of the Codility Training - Genomic Range Query (please refer to one of the evaluation report for the detail of this training). The proper approach for this question is using ...
10
votes
2answers
76 views

Calculating the joint probability of n events from a sample sequence of occurrences

I'm writing an algorithm to take in a sample list of sequences of events, calculate 1-step transitional probabilities from the sequences, forward or in reverse, then calculate the joint probability of ...
4
votes
2answers
153 views

How to improve this Needleman-Wunsch implementation in C#?

I split my implementation of this sequence alignment algorithm in three methods. Where NeedlemanWunsch-method makes use of the ScoringFunction and the Traceback methods. Further I decided to go with ...
5
votes
3answers
157 views

Code optimization for SQLite result set parsing

I am retrieving information from an SQLite database that gives me back around 20 million rows that I need to process. This information is then transformed into a dict of lists which I need to use. I ...
4
votes
1answer
64 views

Cutting strings into smaller ones based on specific criteria

So, I've got this largish (for me) script, and I want to see if anybody could tell me if there are any ways to improve it, both in terms of speed, amount of code and the quality of the code. I still ...
6
votes
1answer
101 views

Calculate query coverage from BLAST output

I have a BLAST output file and want to calculate query coverage, appending the query lengths as an additional column to the output. Let's say I have 2 7 15 f=open('file.txt', 'r') ...
5
votes
5answers
288 views

Bioinformatics: Genome string clump finding problem

I am trying to solve a bioinformatics problems from a Stepic course. The problem posed: find clumps of the same pattern within a longer genome. Motivation: Identifying 3 occurrences of the same ...
2
votes
3answers
335 views

Longest DNA sequence that appears at least twice (only one DNA string as input)

My question is to find the longest DNA sub-sequence that appears at least twice. The input is only one DNA string, NOT TWO strings as other LCS programs. I have done my 4th program and it seems to be ...
5
votes
1answer
232 views

Performance: equivalent C and C++ programs

I write quite a bit of code in C, but haven't done much C++ since my college CS classes. I have been revisiting C++ recently, and thought I would re-implement a program I had previously written in C, ...
4
votes
2answers
122 views

Efficient parsing of FASTQ

FASTQ is a notoriously bad format. This is because it uses the same @ character for the id line as it does for quality scores. Deciding what is a quality score and what is an id is a tricky endeavor ...
11
votes
1answer
869 views

Simple DNA sequence finder w/ mismatch tolerance

The goal with this function is to find one DNA sequence within another sequence, with a specified amount of mismatch tolerance. For example: dnasearch('acc','ccc',0) shouldn't find a match, while ...
2
votes
2answers
130 views

Feedback on text parsing and control structures

I threw together this C program today to handle a bioinformatics data processing task. The program seems to work correctly, but I wanted to know if anyone has suggestions regarding how the input data ...