4

I have a fairly complex algorithm I would like to implement (in Java) with TDD. The algorithm, for translation of natural languages, is called stack decoding.

When I tried to do that I was able to write and fix some simple test cases (empty translation, one word, etc'..), but I am not able to get to the direction of the algorithm I want. I mean that I can't figure out how to write a massive amount of the algorithm as described below in baby steps.

This is the pseudo code of the algorithm:

1: place empty hypothesis into stack 0
2: for all stacks 0...n �?� 1 do
3: for all hypotheses in stack do
4: for all translation options do
5: if applicable then
6: create new hypothesis
7: place in stack
8: recombine with existing hypothesis if possible
9: prune stack if too big
10: end if
11: end for
12: end for
13: end for

Am I missing any way I can do the baby steps, or should I just get some coverage and do the major implementation?

3 Answers 3

3

TL;DR

  1. Recast your task as testing an implementation of an API.
  2. As the algorithm is expressed in terms of concepts (hypothesis, translation option) that are themselves quite complicated, program it bottom-up by first developing (using TDD) classes that encapsulate those concepts.
  3. As the algorithm is not language specific, but abstracts language specific operations (all translation options, if applicable), encapsulate the language specific parts in a separate object, use dependency injection for that object, and test using simple fake implementations of that object.
  4. Use your knowledge of the algorithm to suggest good test cases.

Test the API

By focusing on the implementation (the algorithm), you are making a mistake. Instead, first imagine you had a magic class which did the job that the algorithm performs. What would its API be? What would its inputs be, what would its outputs be? And what would the required connections between the inputs and outputs be. You want to encapsulate your algorithm in this class, and recast your problem as generating this class.

In this case, its seems the input is a sentence that has been tokenized (split into words), and the output is a tokenized sentence , which has been translated into a different language. So I guess the API is something like this:

 interface Translator {
   /**
    * Translate a tokenized sentence from one language to another.
    *
    * @param original
    *    The sentence to translate, split into words,
    *    in the language of the {@linkplain #getTranslatesFrom() locale this
    *    translates from}.
    * @return The translated sentence, split into words,
    *    in the language of the {@linkplain #getTranslatesTo() locale this
    *    translates to}. Not null; containing no null or empty elements.
    *    An empty list indicates that the translator was unable to translate the
    *    given sentence.
    *
    * @throws NullPointerException
    *    If {@code original} is null, or contains a null element.
    * @throws IllegalArgumentException
    *    If {@code original} is empty or has any empty elements.
    */
   public List<String> translate(List<String> original);

   public Locale getTranslatesFrom();

   public Locale getTranslatesTo();
 }

That is, an example of the Strategy design pattern. So your task becomes, not "how do I use TDD to implement this algorithm" but rather "how do I use TDD to implement a particular case of teh Strategy design pattern".

Next you need to think up a sequence of test cases, from easiest to hardest, that use this API. That is, a set of original sentence values to pass to the translate method. For each of those inputs, you must give a set of constraints on the output. The translation must satisfy those constraints. Note that you already have some constraints on the output:

Not null; not empty; containing no null or empty elements.

You will want some example sentences for which you know for sure what the algorithm should output. I suspect you will find there are very few such sentences. Arrange these tests from easiest to pass to hardest to pass. This becomes your TODO list while you are implementing the Translator class.

Implement Bottom Up

You will find that making your code passes more than a couple of these cases very hard. So how can you thoroughly test your code?

Look again at the algorithm. It is complicated enough that the translate method does not do all the work directly itself. It will delegate to other classes for much of the work

place empty hypothesis into stack 0

Do you need a Hypothesis class. A HypothesisStack class?

for all translation options do

Do you need a TranslationOption class?

if applicable then

Is there a method TranslationOption.isApplicable(...)?

recombine with existing hypothesis if possible

Is there a Hypothesis.combine(Hypothesis) method? A Hypothesis.canCombineWith(Hypothesis) method?

prune stack if too big

Is there a HypothesisStack.prune() method?

Your implementation will likely need additional classes. You can implement each of those individually using TDD. Your few tests of the Translator class will end up being integration tests. The other classes will be easier to test than the Translator because they will have precisely defined, narrow definitions of what they ought to do.

Therefore, defer implementing Translator until you have implemented those classes it delegates to. That is, I recommend that you write your code bottom up rather than top down. Writing code that implements the algorithm you have given becomes the last step. At that stage you have classes that you can use to write the implementation using Java code that looks very like the pseudo code of your algorithm. That is, the body of your translate method will be only about 13 lines long.

Push Real-Life Complications into an Associated Object

Your translation algorithm is general purpose; it can be used to translate between any pair of languages. I guess that what makes it applicable to translating a particular pair of languages is for all translation options and if applicable then parts. I guess the latter could be implemented by a TranslationOption.isApplicable(Hypothesis) method. So what makes the algorithm specific to a particular language is generating the translation options. Abstract that to a a factory object that the class delegates to. Something like this:

 interface TranslationOptionGenerator {

    Collection<TranslationOption> getOptionsFor(Hypothesis h, List<String> original);

 }

Now so far you have probably thinking in terms of translating between real languages, with all their nasty complexities. But you do not need that complexity to test your algorithm. You can test it using a fake pair of languages, which are much simpler than real languages. Or (equivalently) using a TranslationOptionGenerator that is not as rich as a practical one. Use dependency injection to associate the Translator with a TranslationOptionGenerator.

Use Simple Fake Implementations instead of Realistic Associates

Now consider some of the most extreme simple cases of a TranslationOptionGenerator that the algorithm must deal with:

  • Such as when there are no options for the empty hypothesis
  • There is only one option
  • There are only two options.
  • An English to Harappan translator. Nobody knows how to translate to Harappan, so this translator indicates it is unable to translate every time.
  • The identity translator: a translator that translates English to English.
  • A Limited English to newborn translator: it can translate only the sentences "I am hungry", "I need changing" and "I am tired", which it translates to "Waaaa!".
  • A simple British to US English translator: most words are the same, but a few have different spellings.

You can use this to generate test cases that do not require some of the loops, or do not require the isApplicable test to be made.

Add those test cases to your TODO list. You will have to write fake simple TeranslatorOptionGenerator objects for use by those test cases.

2
  • actually, I can think of a lot of example sentences with their translation. the problem is that in one of these sentences I will have to do a big jump and not baby step. I don't think dividing it to various classes will ease the problem, and anyway from design perspective I prefer the body of the algorithm in one place for readability.
    – oshai
    Commented Sep 17, 2014 at 21:59
  • This is a good answer, but it goes wrong in suggesting that every little class needs to have its own set of tests. Doing so is perfectly valid, of course, but so is writing tests only for the high-level client classes. You can always rely on the refactoring step of the TDD process to achieve a good OO design.
    – Rogério
    Commented Sep 18, 2014 at 21:29
1

The key here is to understand that "baby steps" do not necessarily mean writing just a small amount of production code at a time. You can just as well write a lot of it, provided you do it to pass a relatively small & simple test.

Some people think that TDD can only be applied one way, namely by writing unit tests. This is not true. TDD does not dictate the kind of tests you should write. It's perfectly valid to do TDD with integration tests that exercise a lot of code. Each test, however, should be focused in a single well-defined scenario. This scenario is the "baby step" that really matters, not the number of classes that may get exercised by the test.

Personally, I develop a complex Java library exclusively with integration-level tests, rigorously following the TDD process. Quite often, I create a very small and simple-looking integration test, which ends up taking a lot of programming effort to make pass, requiring changes to several existing classes and/or the creation of new ones. This has been working well for me for the past 5+ years, to the point I now have over 1300 such tests.

1

TL;DR: Start with nothing and don't write any code that a test does not call for. Then you can't help but TDD the solution

When building something via TDD, you should start with nothing then implement test cases until it does the thing you want. That's why they call it red green refactoring.

Your first test would be to see if the internal object has 0 hypothesis (an empty implementation would be null. [red]). Then you initialize the hypothesis list [green].

Next you would write a test that the hypothesis is checked (it's just created [red]). Implement the "if applicable" logic and apply it to the one hypothesis [green].

You write a test that when a hypothesis is applicable, then create a new hypothesis (check if there's > 1 hypothesis for a hypothesis that's applicable [red]). Implement the create hypothesis logic and stick it in the if body. [green]

conversely, if a hypothesis is not applicable, then do nothing ([green])

Just kinda follow that logic to individually test the algorithm over time. It's much easier to do with an incomplete class than a complete one.

4
  • what I understand is that you suggest to test it from inside to outside. am I correct?
    – oshai
    Commented Sep 16, 2014 at 14:21
  • Yes sir. It's a bit harder when you already have the implementation completed. This way your tests also describe the evolution of the algorithm, and you get immediate feedback if one of your changes introduces a bug, because you now need to look back and check to see what the change did and whether or not your tests are still valid.
    – C Bauer
    Commented Sep 16, 2014 at 14:49
  • My answer is a valid, high level look at how TDD works. However I think that @Raedwald's answer is more specific and describes how TDD can lead to interfaces that conform to SOLID best practices.
    – C Bauer
    Commented Sep 16, 2014 at 14:50
  • 1
    Note that TDD doesn't forbid (or advocate against) thinking ahead. It only forbids coding ahead.
    – cadolphs
    Commented Nov 26, 2019 at 19:03

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.