**Note: I might put this on Code Golf too, but first I'd like to get this code reviewed.
So there's this programming contest my school will be holding soon, which is about solving problems in the shortest (writing) time possible. I want to participate, but I discovered that I suck with puzzles. So today I begun doing a few example problems from past competitions.
Problem Description
A genome is a sequence of characters (A,C,G or T). Sometimes, many genomes share the same sequence of characters:
- ACCC GTT
- A GTT AAC
Given a set of genomes, you will have to find the size of the largest character sequence that is repeated in all given genomes.
Input
The first line is an integer, which just tells you the number of genomes you will be given. The next lines are the genome strings themselves (one per line).
Example:
2
ACGGGCGTCGTCCCCGTCGTCGTATC
CTCGTCGTCCCCGTCGTCGTGTC
Output
Integer representing the size of the largest character sequence you found in all given genomes.
Example (based on previous input):
18
My Analysis
Not gonna lie: I'm not good with clever math voodoo. I just decided to create all possible sequences and then grab the largest from that is found in all genomes. The key to that was using std::string::find()
, which I guess is not the brightest idea ever.
My Solution You may notice that my variables and such are in Spanish. Yeah.
#include <iostream>
#include <deque>
using namespace std;
int main() {
// Read data
int cantidad; cin >> cantidad; // Number of genomes
deque<string> genomas; // Deque of genomes
string menorString; // Smallest genome
for (int i = 0; i < cantidad; ++i) {
string in; cin >> in; // Read a genome
genomas.push_back(in); // Send to deque
// Check if this is the smallest genome so far
if (i == 0) {
menorString = in;
}else {
if (in.length() < menorString.length()){
menorString = in;
}
}
}
// Create all possible sequences
deque<string> sequences; // Deque for sequences
unsigned long size = menorString.length();
// We will use the smallest genome found
for (int a = 0; a < size; ++a) {
// For each char in this genome...
// We begin building a sequence using this char.
string temp = string(1,menorString.at(a));
// And store the sequence
sequences.push_back(temp);
for (int b = a + 1; b < size; ++b) {
// For each character AFTER this character...
// We append it to our current sequence...
temp.append(1,menorString.at(b));
// And store it.
sequences.push_back(temp);
}
}
// Now time to find the largest sequence found in ALL genomes.
string largestSequence;
for (int i = 0; i < sequences.size(); ++i) {
// For each sequence...
bool genomas_ok = true;
for (int g = 0; g < genomas.size() && genomas_ok; ++g) {
// We check all genomes...
string genoma = genomas[g];
// We search for the sequence to be within the genome
if (genoma.find(sequences[i]) == string::npos) {
// Sequence not found in genome. Discard this sequence.
genomas_ok = false;
}
}
if (genomas_ok) {
// All genomes do have this sequence!
// Check if is the largest one we got so far...
if (i == 0) {
largestSequence = sequences[i];
}else{
if (sequences[i].size() > largestSequence.size()) {
largestSequence = sequences[i];
}
}
}
}
// Print size of largest genome.
cout << largestSequence.size();
return 0;
}
Does it work?
Well, it seems to work. It works for the example given above, and with a few of my own tests.
What I need
This code looks pretty large. Not good for a speed contest (I suppose). Can you help me figure out improvements for this? Namely, improvements that make the code smaller - not necessarily about efficiency (but it is appreciated as well!).
I suppose that commenting almost every line is not good for a contest either. Haha...