Bioinformatics is the use of software tools to analyse biological data.

learn more… | top users | synonyms

0
votes
0answers
46 views

Traceback in sequence alignment with affine gap penalty (Needleman-Wunsch)

I am working on an implementation of the Needleman-Wunsch sequence alignment algorithm in python, and I've already implemented the one that uses a linear gap penalty equation for scoring, but now I'm ...
4
votes
4answers
162 views

Counting nucleobases in a nucleotide

This question is part of a series solving the Rosalind challenges. For the previous question in this series, see Calculating protein mass ruby. The repository with all my up-to-date solutions so far ...
3
votes
1answer
74 views

Analyzing pair by pair DNA sequences with for loops in Python

Here is a code I would need to make far faster: it is made to analyse more than 3000 DNA sequences of more than 100 000 characters each. The matter is I have to compare it by pair, so that would be ...
5
votes
1answer
80 views

Reading string from a text file and returning the number of occurrences of each substring of length k

This program takes a simple nucleotide sequence and finds the most common "k-mers" in the sequence, as determined by the supplied dataset (see below). The goal of the program is to find the origin of ...
3
votes
1answer
60 views

Unique nucleotide permutations, Python itertools product

I'm searching for all the unique possible permutations of nucleotide given a defined length. By unique, I mean the reverse complement would not be counted. ACGT For example, permutations of length ...
3
votes
1answer
88 views

Python script which translates a DNA sequence

For my Biology class, I made a Python script which takes a DNA sequence as input, translates it into an mRNA sequence, and then again into a tRNA sequence. It then matches each mRNA codon with an ...
3
votes
3answers
110 views
1
vote
0answers
18 views

Parsing GTF file using command-line

I am extracting exons details from a GTF file using command line in Unix like cut, awk, grep or sed. input file.gtf: ...
1
vote
0answers
69 views

Edit distance (Optimal Alignment) - follow up

This is a follow up of this question Optimal substructure of ED: Here is the reasoning behind my solution: let \$x = (\alpha _{1},\alpha _{2},\alpha _{3},...,\alpha _{m})\$ and \$y = (\beta_{1},\...
4
votes
1answer
63 views

Calculating how similar two objects are according to a database

I want to calculate how similar two objects are according to a database. However the code is very slow; it takes around 2 minutes to analyze just 3 objects. How can I speed it up? I have tried to ...
0
votes
0answers
601 views

Needleman–Wunsch algorithm in Rust

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. Here is an implementation in Rust: ...
2
votes
1answer
47 views

Returning substrings (motifs) and original strings (sequences) from a file of strings (sequences)

I would like to get some help/tips on writing better and more pythonic code, as well as variable naming. Code info: .backtranscribe() is just a method to convert ...
6
votes
1answer
87 views

Counting K-mers (words) in a sequence string

I recently found out how to use joblib for parallelization. I want to iterate through this string seq in steps of 1 and count ...
2
votes
2answers
67 views

Reading genetic data in VCF and Tabix formats using an asynchronous library

I'm working with an open-source library for processing and parsing genetic data in VCF and Tabix formats. It contains functions and classes that make it easy to read an index file (a Tabix) and load ...
6
votes
1answer
95 views

Calculating protein mass

This question is part of a series solving the Rosalind challenges. For the previous question in this series, see A sequence of mistakes. The repository with all my up-to-date solutions so far can be ...
4
votes
3answers
103 views

Nucleotide count in Scala

This is my second day in learning Scala and I still need to develop a taste of functional programming, I often find myself doing imperative coding. Below is the result of my TDD practice. Code ...
5
votes
3answers
48 views

Counting GUAG introns in chromosomes

I have this code that is working fine but it's taking pretty much 100% of my cpu to run and it takes around 25min. I'd really like to optimize it but don't know what parts I could improve. The main ...
2
votes
2answers
56 views

A sequence of mistakes

This question is part of a series solving the Rosalind challenges. For the previous question in this series, see The Genetic Code. The repository with all my up-to-date solutions so far can be found ...
6
votes
3answers
494 views

The Genetic Code

This question is part of a series solving the Rosalind challenges. For the previous question in this series, see Wascally wabbits. The repository with all my up-to-date solutions so far can be found ...
2
votes
0answers
88 views

Needleman Wunsch algorithm in Scala

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. Here is an implementation in Scala: ...
2
votes
0answers
28 views

Adding information to a compressed file and compressing the output

I wrote this script for adding information to a compressed file and compressing the output: ...
1
vote
1answer
41 views

Compare sequence & maps headers in fasta file

This is the perl code which compares the sequence in fasta file & maps the header. Though the code is working well, I still would like to make it more efficient. Since the files I compare has >...
3
votes
0answers
55 views

Determining if a genetic sequence is palindromic

Adding another level to my previous question on 'normal' palindrome identification, in this one I'm interested in identifying genetic palindromes. Here's my attempt: ...
1
vote
1answer
50 views

Collating creature descriptors spread across multiple stanzas

I have been using Python for only a few days, so I am trying to learn about some best practices. An explanation of what this code is supposed to do is at the bottom of this post. It is an exercise to ...
5
votes
1answer
95 views

VCF parser for eventual genomic data visualization

I've just started out writing an app that will visualize genomic data for anybody to understand. When you get your genome sequenced the raw data usually comes in the form of a VCF file. I started out ...
4
votes
1answer
135 views

Rosalind problem “Consensus and Profile”

Source: Rosalind("Consensus and Profile") Brief summary ...
6
votes
3answers
80 views

Categorizing gene sequences read from a CSV file

I am relatively new to programming and would love to get some feedback on the following section of my code. ...
3
votes
1answer
66 views

V Snare T Snare Model

In the beginning, everything is defined to be of value 10, but I have to change them to suit them for different possible values, hence those are changing. I'm a (Im)mature C coder, hence there might ...
2
votes
1answer
42 views

Compare a sequence with the reference frequency of hexamers

I have written this function (and others similar to that one) But I am not sure I am using references on their full power. My currently concerns is if I make a huge use of memory. The subroutine ...
4
votes
2answers
232 views

DNA base pair match counter

So my code is done it outputs exactly what it needs to I'm just wondering if it is possible to make this code a lot more simple using objects. If so could someone tell me what I would need member-wise ...
3
votes
2answers
79 views

Rosalind string algorithm problems

I've been starting to learn Rust by going through some of the Rosalind String Algorithm problems. If anyone would like to point out possible improvements, or anything else, that would be great. There ...
7
votes
3answers
180 views

Prefix Sum in Ruby, Genomic Range Query from Codility

I'm currently going through some lessons on Codility. I've just spent a couple of hours with GenomicRangeQuery, which is intended to demonstrate the use of prefix sums. The task description is here. ...
6
votes
1answer
361 views

High performance parsing for large, well-formatted text files

I am looking to optimize the performance of a big data parsing problem I have using Python. The example data I show are segments of whole genome DNA sequence alignments for six primate species. Each ...
6
votes
3answers
92 views

Building a report of DNA sites and chunks

Here is the slow part of my code: ...
4
votes
1answer
95 views

Find allele frequencies at each site for each iteration for each population from FASTA file

The script takes a FASTA format file in input and outputs the frequencies of each amino acid (A, C, ...
0
votes
2answers
1k views

Comparing two columns in two different rows

I want to go through each line of the a .csv file and compare to see if the first field of line 1 is the same as first field of next line and so on. If it finds a match then I would like to ignore ...
2
votes
2answers
80 views

RNA/DNA transcriber

I've been going through some of the exercises over on exercism and this is one of my solutions: a basic RNA/DNA transcriber. I was happy enough at first but now, looking at it again, the solution ...
4
votes
2answers
143 views

Fast comparison of molecular structures and deleting duplicates

I have a program that reads in two xyz-files (molecular structures) and compares them by an intramolecular distance measure (dRMSD, Fig. 22). A friend told me that my program structure is bad, and as ...
5
votes
2answers
68 views

Converting domain-specific regular-expressions to a list of all matching instances

There seem to be several questions floating around Stackexchange regarding how to take a python regular expression list the matching instances. This problem is a bit different because 1) I'm need to ...
7
votes
1answer
151 views

Statistics about gaps in DNA sequences

Noobie to Numba here, I'm trying to get faster code from existing function but the result is not faster. 10 times faster would be heaven, but I know nothing about optimization. This is code about ...
3
votes
1answer
450 views

Python Longest Repeat

I am trying to find the longest repeated string in text with python, both quickly and space efficiently. I created an implementation of a suffix tree in order to make the processing fast, but the ...
4
votes
2answers
211 views

bash script for constructing RNA pipeline

I have written a bash script that consists of multiple commands and Python scripts. The goal is to make a pipeline for detecting long non coding RNA from a certain input. Ultimately I would like to ...
5
votes
1answer
311 views

Reading an Excel file and comparing the amino acid sequence of each data pair

Since I am fairly new to Python I was wondering whether anyone can help me by making the code more efficient. I know the output stinks; I will be using Pandas to make this a little nicer. ...
1
vote
2answers
121 views

Counting adenine and cytosine bases

I've started a little challenge on a website, and the first one was about counting different DNA letters. I've done it, but I found my method very brutal. I have a little experience, and I know that ...
4
votes
2answers
325 views

Reflecting emotion classification based on the Lövheim cube

Background I created a simple class to reflect emotion classification based on the Lövheim cube. The code is not scientific at all, and I just did it for fun, but I want all code I write to be as ...
2
votes
1answer
45 views

A Java class for reading MaCH dosage files v2.0

Version 2 of A Java class for reading MaCH dosage files ...
3
votes
1answer
80 views

A Java class for reading MaCH dosage files

A dosage file (used in computational genetics) is formatted like this: ...
4
votes
2answers
166 views

Convert impute2 files to mach format

Here is a program for converting Impute2 files into MaCH format (related to genetics). Source files include one xxx_haps file and one xxx_samples file, for example: ...
3
votes
0answers
78 views

Finding the Cox regression coefficients in a mixed model for microarray data

I have written a code for a project which aims at finding the Cox regression coefficients in a mixed model for microarray data. The study was carried out on the Affymetrix Hgu133a platform. In the ...
2
votes
1answer
238 views

Slow Python text-processing script

This script of mine merges columns 1 and 2 from one input file and sees if these merged combinations exist in the other infile (and vice versa). I know I get stuck in appending. It did not get past ...