The Computer Language Benchmarks Game has a dozen of tasks with input file, expected results and solutions in many programming languages.
For example, the k-nucleotide benchmark (Hashtable update and k-nucleotide strings) is defined like this:
We use FASTA files generated by the fasta benchmark as input for this benchmark.
Note: the file may include both lowercase and uppercase codes.
Each program should
1. read line-by-line a redirected FASTA format file from stdin
2. extract DNA sequence THREE
3. define a procedure/function to update a hashtable of k-nucleotide keys and count
values, for a particular reading-frame — even though we'll combine k-nucleotide
counts for all reading-frames (grow the hashtable from a small default size)
4. use that procedure/function and hashtable to
- count all the 1-nucleotide and 2-nucleotide sequences, and write the code and
percentage frequency, sorted by descending frequency and then ascending
k-nucleotide key
- count all the 3- 4- 6- 12- and 18-nucleotide sequences, and write the count and
code for the specific sequences GGT GGTA GGTATT GGTATTTTAATT GGTATTTTAATTTATAGT