TL;DR
- Recast your task as testing an implementation of an API.
- As the algorithm is expressed in terms of concepts (hypothesis, translation option) that are themselves quite complicated, program it bottom-up by first developing (using TDD) classes that encapsulate those concepts.
- As the algorithm is not language specific, but abstracts language specific operations (all translation options, if applicable), encapsulate the language specific parts in a separate object, use dependency injection for that object, and test using simple fake implementations of that object.
- Use your knowledge of the algorithm to suggest good test cases.
Test the API
By focusing on the implementation (the algorithm), you are making a mistake. Instead, first imagine you had a magic class which did the job that the algorithm performs. What would its API be? What would its inputs be, what would its outputs be? And what would the required connections between the inputs and outputs be. You want to encapsulate your algorithm in this class, and recast your problem as generating this class.
In this case, its seems the input is a sentence that has been tokenized (split into words), and the output is a tokenized sentence , which has been translated into a different language. So I guess the API is something like this:
interface Translator {
/**
* Translate a tokenized sentence from one language to another.
*
* @param original
* The sentence to translate, split into words,
* in the language of the {@linkplain #getTranslatesFrom() locale this
* translates from}.
* @return The translated sentence, split into words,
* in the language of the {@linkplain #getTranslatesTo() locale this
* translates to}. Not null; containing no null or empty elements.
* An empty list indicates that the translator was unable to translate the
* given sentence.
*
* @throws NullPointerException
* If {@code original} is null, or contains a null element.
* @throws IllegalArgumentException
* If {@code original} is empty or has any empty elements.
*/
public List<String> translate(List<String> original);
public Locale getTranslatesFrom();
public Locale getTranslatesTo();
}
That is, an example of the Strategy design pattern. So your task becomes, not "how do I use TDD to implement this algorithm" but rather "how do I use TDD to implement a particular case of teh Strategy design pattern".
Next you need to think up a sequence of test cases, from easiest to hardest, that use this API. That is, a set of original sentence values to pass to the translate
method. For each of those inputs, you must give a set of constraints on the output. The translation must satisfy those constraints. Note that you already have some constraints on the output:
Not null; not empty; containing no null or empty elements.
You will want some example sentences for which you know for sure what the algorithm should output. I suspect you will find there are very few such sentences. Arrange these tests from easiest to pass to hardest to pass. This becomes your TODO list while you are implementing the Translator
class.
Implement Bottom Up
You will find that making your code passes more than a couple of these cases very hard. So how can you thoroughly test your code?
Look again at the algorithm. It is complicated enough that the translate
method does not do all the work directly itself. It will delegate to other classes for much of the work
place empty hypothesis into stack 0
Do you need a Hypothesis
class. A HypothesisStack
class?
for all translation options do
Do you need a TranslationOption
class?
if applicable then
Is there a method TranslationOption.isApplicable(...)
?
recombine with existing hypothesis if possible
Is there a Hypothesis.combine(Hypothesis)
method? A Hypothesis.canCombineWith(Hypothesis)
method?
prune stack if too big
Is there a HypothesisStack.prune()
method?
Your implementation will likely need additional classes. You can implement each of those individually using TDD. Your few tests of the Translator
class will end up being integration tests. The other classes will be easier to test than the Translator
because they will have precisely defined, narrow definitions of what they ought to do.
Therefore, defer implementing Translator
until you have implemented those classes it delegates to. That is, I recommend that you write your code bottom up rather than top down. Writing code that implements the algorithm you have given becomes the last step. At that stage you have classes that you can use to write the implementation using Java code that looks very like the pseudo code of your algorithm. That is, the body of your translate
method will be only about 13 lines long.
Push Real-Life Complications into an Associated Object
Your translation algorithm is general purpose; it can be used to translate between any pair of languages. I guess that what makes it applicable to translating a particular pair of languages is for all translation options
and if applicable then
parts. I guess the latter could be implemented by a TranslationOption.isApplicable(Hypothesis)
method. So what makes the algorithm specific to a particular language is generating the translation options. Abstract that to a a factory object that the class delegates to. Something like this:
interface TranslationOptionGenerator {
Collection<TranslationOption> getOptionsFor(Hypothesis h, List<String> original);
}
Now so far you have probably thinking in terms of translating between real languages, with all their nasty complexities. But you do not need that complexity to test your algorithm. You can test it using a fake pair of languages, which are much simpler than real languages. Or (equivalently) using a TranslationOptionGenerator
that is not as rich as a practical one. Use dependency injection to associate the Translator
with a TranslationOptionGenerator
.
Use Simple Fake Implementations instead of Realistic Associates
Now consider some of the most extreme simple cases of a TranslationOptionGenerator
that the algorithm must deal with:
- Such as when there are no options for the empty hypothesis
- There is only one option
- There are only two options.
- An English to Harappan translator. Nobody knows how to translate to Harappan, so this translator indicates it is unable to translate every time.
- The identity translator: a translator that translates English to English.
- A Limited English to newborn translator: it can translate only the sentences "I am hungry", "I need changing" and "I am tired", which it translates to "Waaaa!".
- A simple British to US English translator: most words are the same, but a few have different spellings.
You can use this to generate test cases that do not require some of the loops, or do not require the isApplicable
test to be made.
Add those test cases to your TODO list. You will have to write fake simple TeranslatorOptionGenerator
objects for use by those test cases.