Recently Active &#39;bioinformatics&#39; Questions

2

votes

0answers

16 views

Implementation of fast optimal global sequence alignment algorithm

The link to the paper, where the algorithm is explained in-depth and its optimality and termination is proven, can be found here. I also found a C++ implementation, which is a bit unreadable and does ...

bioinformatics julia

modified 2 days ago

kjenova
113

8

votes

1answer

317 views

Genetic Algorithm in Python

I'm a new programmer, so any help is welcome. Preferably to make it faster, avoid heavy memory usage, and so on. ...

modified Mar 14 at 22:03

Hosch250
5,07011459

3

votes

1answer

65 views

Python Longest Repeat

I am trying to find the longest repeated string in text with python, both quickly and space efficiently. I created an implementation of a suffix tree in order to make the processing fast, but the ...

python algorithm strings tree bioinformatics

modified Mar 10 at 13:23

Janne Karila
6,242719

4

votes

1answer

125 views

Cutting strings into smaller ones based on specific criteria

I've got this largish (for me) script, and I want to see if anybody could tell me if there are any ways to improve it, both in terms of speed, amount of code and the quality of the code. I still ...

python strings python3 csv bioinformatics

modified Feb 28 at 20:40

Jamal♦
23.3k678170

4

votes

2answers

82 views

bash script for constructing RNA pipeline

I have written a bash script that consists of multiple commands and Python scripts. The goal is to make a pipeline for detecting long non coding RNA from a certain input. Ultimately I would like to ...

bash bioinformatics

modified Feb 28 at 7:48

janos
40.9k445180

5

votes

1answer

54 views

Reading an Excel file and comparing the amino acid sequence of each data pair

Since I am fairly new to Python I was wondering whether anyone can help me by making the code more efficient. I know the output stinks; I will be using Pandas to make this a little nicer. ...

python beginner excel bioinformatics pandas

modified Feb 22 at 18:53

200_success♦
58.7k471226

1

vote

2answers

73 views

Counting adenine and cytosine bases

I've started a little challenge on a website, and the first one was about counting different DNA letters. I've done it, but I found my method very brutal. I have a little experience, and I know that ...

python beginner bioinformatics

modified Feb 16 at 0:23

200_success♦
58.7k471226

2

votes

2answers

146 views

Data processing task for bioinformatics

I threw together this C program today to handle a bioinformatics data processing task. The program seems to work correctly, but I wanted to know if anyone has suggestions regarding how the input data ...

c parsing bioinformatics

modified Jan 30 at 14:02

Jamal♦
23.3k678170

5

votes

1answer

99 views

Case study with a biological populations: a list of lists of lists

I have a population (Pop) which has an attribute which is a list of individuals (Ind) where each individual has an attribute ...

python oop bioinformatics

modified Jan 18 at 18:43

Jamal♦
23.3k678170

4

votes

2answers

144 views

Reflecting emotion classification based on the Lövheim cube

Background I created a simple class to reflect emotion classification based on the Lövheim cube. The code is not scientific at all, and I just did it for fun, but I want all code I write to be as ...

python classes python-2.7 bioinformatics

modified Dec 30 '14 at 10:05

Josay
14.5k1755

2

votes

1answer

36 views

A Java class for reading MaCH dosage files v2.0

Version 2 of A Java class for reading MaCH dosage files ...

java parsing io bioinformatics

modified Nov 18 '14 at 14:10

Nihathrael
551112

3

votes

1answer

62 views

A Java class for reading MaCH dosage files

A dosage file (used in computational genetics) is formatted like this: ...

java parsing io bioinformatics

modified Nov 16 '14 at 19:02

janos
40.9k445180

8

votes

3answers

529 views

Statistical calculations with sets of genes

The following piece of code executes 20 million times each time the program is called, so I need a way to make this code as optimized as possible. ...

c++ performance statistics bioinformatics

modified Nov 11 '14 at 2:48

200_success♦
58.7k471226

4

votes

2answers

50 views

Convert impute2 files to mach format

Here is a program for converting Impute2 files into MaCH format (related to genetics). Source files include one xxx_haps file and one xxx_samples file, for example: ...

java converting file io bioinformatics

modified Nov 7 '14 at 21:38

toto2
3,828716

7

votes

4answers

2k views

Genome string clump finding problem

I am trying to solve a bioinformatics problems from a Stepic course. The problem posed: find clumps of the same pattern within a longer genome. Motivation: Identifying 3 occurrences of the same ...

python performance bioinformatics

modified Oct 30 '14 at 22:56

marius_neo
183

7

votes

6answers

341 views

Explicit Function Notation in Perl

I've gone back and forth a few times recently on my Perl coding style when it comes to module subroutines. If you have an object and you want to call the method bar ...

perl bioinformatics

modified Oct 29 '14 at 19:03

200_success♦
58.7k471226

5

votes

4answers

309 views

Generating DNA sequences and looking for correlations

I've written a script to generate DNA sequences and then count the appearance of each step to see if there is any long range correlation. My program runs really slow for a length 100000 sequence 100 ...

python beginner algorithm bioinformatics

modified Oct 22 '14 at 5:49

Jamal♦
23.3k678170

3

votes

0answers

62 views

Finding the Cox regression coefficients in a mixed model for microarray data

I have written a code for a project which aims at finding the Cox regression coefficients in a mixed model for microarray data. The study was carried out on the Affymetrix Hgu133a platform. In the ...

performance beginner r bioinformatics

modified Oct 7 '14 at 9:03

Quaxton Hale
2,0032350

3

votes

1answer

88 views

Calculating overlap of segments in chromosome data

I wrote an R code that basically performs 2 operations: For each segment in file A, find all segments in file B that lie in that segment. Find the percentage of overlap for each case in previous ...

r performance bioinformatics

modified Sep 29 '14 at 17:01

Jamal♦
23.3k678170

2

votes

1answer

177 views

Slow Python text-processing script

This script of mine merges columns 1 and 2 from one input file and sees if these merged combinations exist in the other infile (and vice versa). I know I get stuck in appending. It did not get past ...

python csv file bioinformatics time-limit-exceeded

modified Sep 2 '14 at 11:57

200_success♦
58.7k471226

4

votes

0answers

64 views

Vectorize Fisher's Exact Test

I have two data frames/ lists of data, humanSplit and ratSplit, and they are of the form ...

optimization csv r statistics bioinformatics

modified Aug 10 '14 at 14:47

Jamal♦
23.3k678170

5

votes

2answers

144 views

Finding database matches and storing them in a glycopeptide structure

I am relatively new to C and would like some feedback on a function that I have written, if it adheres to C standards or if there are some other things which I could have done better/differently. ...

beginner c mysql data-structures bioinformatics

modified Jul 22 '14 at 4:35

Jamal♦
23.3k678170

0

votes

1answer

116 views

Faster way to parse file to array, compare to array in second file, write final file

I currently have an MGF file containing MS2 spectral data (QE_2706_229_sequest_high_conf.mgf). The file template is here, as well as a snippet of example: ...

python performance bioinformatics

modified Jul 15 '14 at 16:47

Jamal♦
23.3k678170

6

votes

2answers

157 views

Comparing 2 lists of peptide to spectrum rankings generated by 2 different algorithms

I'm seeking a general review, but I'm particularly interested in style. This program gets 2 lists of peptide to spectrum matches, so every spectrum title is linked to a list of 1 or 10 possible ...

java bioinformatics

modified Jun 14 '14 at 8:16

Jamal♦
23.3k678170

10

votes

3answers

696 views

Counting DNA nucleotides in C

I have written code to solve the following Rosalind problem. This is my first time writing in C and I would like a review of my code, particularly in regard to correctness and performance. ...

c beginner bioinformatics

modified Jun 13 '14 at 16:51

Jamal♦
23.3k678170

1

vote

2answers

201 views

Parsing BLAST output in XML format using Regular Expression

There many other better ways to parse BLAST output in .xml format, but I was curious to try using regex, even if it is not so straightforward and common. Here is the code how to extract translated ...

python regex bioinformatics

modified May 29 '14 at 6:33

Tomek Wyderka
1112

5

votes

3answers

529 views

Optimization for SQLite result set parsing

I am retrieving information from an SQLite database that gives me back around 20 million rows that I need to process. This information is then transformed into a dict of lists which I need to use. I ...

python python3 sqlite generator bioinformatics

modified May 28 '14 at 19:01

Jamal♦
23.3k678170

3

votes

2answers

211 views

Rosalind's 3rd problem in Scheme

I have an imperative programming background and I've decided to study functional programming by applying it to problems found on sites such as Project Euler and Rosalind. My language of choice is ...

beginner scheme bioinformatics

modified May 13 '14 at 6:46

Anonymous
1,65811

4

votes

2answers

93 views

Data screening using Perl

Background information I've been asked to write a little Perl script that allows genomic data to be screened against reference files in order to determine locations of specific mutations. The input ...

perl csv bioinformatics join

modified Apr 26 '14 at 2:27

Edward
9,81611467

6

votes

2answers

86 views

Foreach-loop for and print commands

How can I make the following code shorter or efficient (maybe with other loops or other nice ideas), and keep the current functionality? ...

loop perl bioinformatics

modified Mar 31 '14 at 18:33

mpapec
45316

5

votes

2answers

282 views

Genomic Range Query

Recently I worked on one of the Codility Training - Genomic Range Query (please refer to one of the evaluation report for the detail of this training). The proper approach for this question is using ...

c complexity programming-challenge bioinformatics

modified Mar 30 '14 at 4:39

syb0rg
11.3k347124

10

votes

2answers

846 views

Calculating the joint probability of n events from a sample sequence of occurrences

I'm writing an algorithm to take in a sample list of sequences of events, calculate 1-step transitional probabilities from the sequences, forward or in reverse, then calculate the joint probability of ...

python algorithm numpy bioinformatics

modified Mar 22 '14 at 0:34

Gareth Rees
12.2k12451

4

votes

2answers

741 views

How to improve this Needleman-Wunsch implementation in C#?

I split my implementation of this sequence alignment algorithm in three methods. Where NeedlemanWunsch-method makes use of the ScoringFunction and the Traceback methods. Further I decided to go with ...

c# algorithm bioinformatics

modified Feb 16 '14 at 10:34

thomasch
642

6

votes

1answer

650 views

Calculate query coverage from BLAST output

I have a BLAST output file and want to calculate query coverage, appending the query lengths as an additional column to the output. Let's say I have 2 7 15 ...

python regex linux csv bioinformatics

modified Jan 28 '14 at 13:48

Gareth Rees
12.2k12451

11

votes

1answer

2k views

Simple DNA sequence finder w/ mismatch tolerance

The goal with this function is to find one DNA sequence within another sequence, with a specified amount of mismatch tolerance. For example: ...

python bioinformatics

modified Dec 4 '13 at 15:39

Jamal♦
23.3k678170

2

votes

3answers

986 views

Longest DNA sequence that appears at least twice (only one DNA string as input)

My question is to find the longest DNA sub-sequence that appears at least twice. The input is only one DNA string, NOT TWO strings as other LCS programs. I have done my 4th program and it seems to be ...

java strings bioinformatics

modified Nov 23 '13 at 17:19

Jamal♦
23.3k678170

5

votes

1answer

488 views

Performance: equivalent C and C++ programs

I write quite a bit of code in C, but haven't done much C++ since my college CS classes. I have been revisiting C++ recently, and thought I would re-implement a program I had previously written in C, ...

c++ performance parsing c++11 bioinformatics

modified Nov 11 '13 at 18:41

Michael Urman
3,2901221

4

votes

2answers

602 views

Efficient parsing of FASTQ

FASTQ is a notoriously bad format. This is because it uses the same @ character for the id line as it does for quality scores. Deciding what is a quality score and ...

python parsing bioinformatics

modified Oct 22 '13 at 11:07

Gareth Rees
12.2k12451

your communities

Tagged Questions

Related Tags