Take the tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

I've just built a self-balancing tree (red-black) in Java (language should be irrelevant for this question though), and I'm trying to come up with a good means of testing that it's properly balanced.

I've tested all the basic tree operations, but I can't think of a way to test that it is indeed well and truly balanced. I've tried inserting a large dictionary of words, both pre-sorted and un-sorted. With a balanced tree, those should take roughly the same amount of time, but an unbalanced tree would take significantly longer on the already-sorted list. But I don't know how to go about testing for that in any reasonable, reproducible way. (I've tried doing millisecond tests on these, but there's no noticeable difference - probably because my source data is too small.)

Is there a better way to be sure that the tree is really balanced? Say, by looking at the tree after it's created and seeing how deep it goes? (That is, without modifying the tree itself by adding a depth field to each node, which is just wasteful if you don't need it for anything other than testing.)

share|improve this question
 
Your source data is too small? Try inserting all 32-bit integers, sorted and unsorted. –  U2EF1 Nov 11 at 2:26
add comment

2 Answers

up vote 9 down vote accepted

One way to do this would be to create a method on your tree that measures the depth of the tree at a given node. You don't have to store the value, and if you use such a getDepth() method only for testing, then there's no extra overhead for normal tree operations. The getDepth() method would recursively traverse its child nodes and return the maximum depth found.

Once you have that, you can then check that your whole tree is balanced by recursing over each node of the tree, and verifying a condition something like:

Math.abs(getDepth(left) - getDepth(right)) <= 1
share|improve this answer
1  
As an addition to this answer: if you prevent direct access to the implementation (eg, by using a factory and an interface), you should feel free to add any useful method to the implementation class. Including methods designed solely for testing. –  kdgregory Nov 11 at 1:36
 
Is the depth-difference of no greater than 1 at every node guaranteed for red-black? I know this would work for AVL trees, but I remember reading somewhere that other balanced trees don't necessarily have this property despite being relatively balanced. I'd think something like comparing the maximum depth to the log(n) would a more general solution, but I think there's some wiggle-room there as well... –  Darrel Hoffman Nov 11 at 2:38
 
Well if you have a different criteria for what you consider "balanced", then use that. My answer is showing another way to test that your tree meets your requirements, other than measuring performance like you suggest (measuring performance only determines the "balanced-ness" indirectly and is not definitive). –  Greg Hewgill Nov 11 at 2:39
 
Yeah, this probably is the way to go. I just wasn't sure if it would work for red-black. (I've done AVL trees a couple times in various languages, but this is my first time experimenting with other types of balanced trees.) –  Darrel Hoffman Nov 11 at 2:45
1  
The appropriate invariant of red-black tree is "Every path from a given node to any of its descendant leaves contains the same number of black nodes." And only that many red nodes, because red node may only have black children. –  Jan Hudec Nov 11 at 8:36
show 1 more comment

If you want to measure time complexity for reads one idea is that you could have the tree issuing callbacks for traversals. This way you could measure the number of operations for a read, which would be a more reliable measurement of time complexity.

share|improve this answer
 
And also add complexity to every step of the operation. If it were just being used to test (and compare to something unbalanced, say), that might be okay, but you'd have to then disable that code before putting into production. –  Darrel Hoffman Dec 10 at 21:58
 
Yes, it would be costly. A compromise is to use an internal counter, that should be cheaper. Direct statistics are as I said IMO the only really reliable way of measuring time complexity. But if you want to get fancy you can try ThreadMXBean.getThreadCPUTime() to measure it. –  Alexander Torstling Dec 11 at 6:41
 
Well, as long as you can prove the tree is balanced at any given point, you know that any basic operations will operate in log time. So a solution which analyzes the current state of the tree and proves it's balanced should be good enough, and doesn't needlessly add time or space complexity when it's not being used for testing. (And is in fact what I've already done - this was a month ago.) –  Darrel Hoffman Dec 11 at 16:10
 
Yes well I just wanted to add some pointers as to how you could go ahead with your original approach if you wanted to. That's why I started with "If you want to measure time complexity". For reference. –  Alexander Torstling Dec 11 at 19:40
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.