I have a symmetric matrix like shown in the image attached below.

Example matrix

I've made up the notation A.B which represents the value at grid point (A, B). Furthermore, writing A.B.C gives me the minimum grid point value like so: MIN((A,B), (A,C), (B,C)).

As another example A.B.D gives me MIN((A,B), (A,D), (B,D)).

My goal is to find the minimum values for ALL combinations of letters (not repeating) for one row at a time e.g for this example I need to find min values with respect to row A which are given by the calculations:

A.B = 6
A.C = 8
A.D = 4
A.B.C = MIN(6,8,6) = 6
A.B.D = MIN(6, 4, 4) = 4
A.C.D = MIN(8, 4, 2) = 2
A.B.C.D = MIN(6, 8, 4, 6, 4, 2) = 2

I realize that certain calculations can be reused which becomes increasingly important as the matrix size increases, but the problem is finding the most efficient way to implement this reuse.

Can point me in the right direction to finding an efficient algorithm/data structure I can use for this problem?

share|improve this question

What about B.C, B.D, and C.D? Why does your goal not include these? – mbeckish Jun 3 '11 at 13:08
The values I need work on a row by row basis. So I need to find the min values with respect to row A only first, do some calculations on those values, the move on to row B. – Projectile Fish Jun 3 '11 at 13:11
So your goal, as stated above, is not complete? You have stated the goal for row A, but then you also have goals for the other rows? – mbeckish Jun 3 '11 at 13:12
Yeah sorry, I just updated the question. – Projectile Fish Jun 3 '11 at 13:16
If the rows are independent, why bother looking at it as a matrix? Just consider one row at a time... Or am I missing something? – Aaron Watters Jun 3 '11 at 13:19
show 7 more comments
feedback

2 Answers

up vote 1 down vote accepted

You'll want to think about the lattice of subsets of the letters, ordered by inclusion. Essentially, you have a value f(S) given for every subset S of size 2 (that is, every off-diagonal element of the matrix - the diagonal elements don't seem to occur in your problem), and the problem is to find, for each subset T of size greater than two, the minimum f(S) over all S of size 2 contained in T. (And then you're interested only in sets T that contain a certain element "A" - but we'll disregard that for the moment.)

First of all, note that if you have n letters, that this amounts to asking Omega(2^n) questions, roughly one for each subset. (Excluding the zero- and one-element subsets and those that don't include "A" saves you n + 1 sets and a factor of two, respectively, which is allowed for big Omega.) So if you want to store all these answers for even moderately large n, you'll need a lot of memory. If n is large in your applications, it might be best to store some collection of pre-computed data and do some computation whenever you need a particular data point; I haven't thought about what would work best, but for example computing data only for a binary tree contained in the lattice would not necessarily help you anything beyond precomputing nothing at all.

With these things out of the way, let's assume you actually want all the answers computed and stored in memory. You'll want to compute these "layer by layer", that is, starting with the three-element subsets (since the two-element subsets are already given by your matrix), then four-element, then five-element, etc. This way, for a given subset S, when we're computing f(S) we will already have computed all f(T) for T strictly contained in S. There are several ways that you can make use of this, but I think the easiest might be to use two such subset S: let t1 and t2 be two different elements of T that you may select however you like; let S be the subset of T that you get when you remove t1 and t2. Write S1 for S plus t1 and write S2 for S plus t2. Now every pair of letters contained in T is either fully contained in S1, or it is fully contained in S2, or it is {t1, t2}. Look up f(S1) and f(S2) in your previously computed values, then look up f({t1, t2}) directly in the matrix, and store f(T) = the minimum of these 3 numbers.

If you never select "A" for t1 or t2, then indeed you can compute everything you're interested in while not computing f for any sets T that don't contain "A". (This is possible because the steps outlined above are only interesting whenever T contains at least three elements.) Good! This leaves just one question - how to store the computed values f(T). What I would do is use a 2^(n-1)-sized array; represent each subset-of-your-alphabet-that-includes-"A" by the (n-1) bit number where the ith bit is 1 whenever the (i+1)th letter is in that set (so 0010110, which has bits 2, 4, and 5 set, represents the subset {"A", "C", "D", "F"} out of the alphabet "A" .. "H" - note I'm counting bits starting at 0 from the right, and letters starting at "A" = 0). This way, you can actually iterate through the sets in numerical order and don't need to think about how to iterate through all k-element subsets of an n-element set. (You do need to include a special case for when the set under consideration has 0 or 1 element, in which case you'll want to do nothing, or 2 elements, in which case you just copy the value from the matrix.)

share|improve this answer
Can I just get a clarification on iterating through the subsets? You say that I can iterate through the subsets in numerical order, but if I was to add 1 to 0010110, I'd get 0010111 which is now part of the 5 letter subset. How does this help me iterate through all the subsets layer by layer? – Projectile Fish Jun 7 '11 at 10:08
1  
Sorry - I wasn't entirely clear. When thinking about the algorithm from an abstract point of view, I found it easiest to think about it in a layer-by-layer fashion, first doing three-element subsets, then four, etc. Once you get to implementation, it's actually easier to implement in numerical order, which is not layer-by-layer but still has the essential property: before you process any subset S, you process all subsets of S. – Erik P. Jun 10 '11 at 18:40
feedback

Well, it looks simple to me, but perhaps I misunderstand the problem. I would do it like this:

  • let P be a pattern string in your notation X1.X2. ... .Xn, where Xi is a column in your matrix

  • first compute the array CS = [ (X1, X2), (X1, X3), ... (X1, Xn) ], which contains all combinations of X1 with every other element in the pattern; CS has n-1 elements, and you can easily build it in O(n)

  • now you must compute min (CS), i.e. finding the minimum value of the matrix elements corresponding to the combinations in CS; again you can easily find the minimum value in O(n)

  • done.

Note: since your matrix is symmetric, given P you just need to compute CS by combining the first element of P with all other elements: (X1, Xi) is equal to (Xi, X1)


If your matrix is very large, and you want to do some optimization, you may consider prefixes of P: let me explain with an example

  • when you have solved the problem for P = X1.X2.X3, store the result in an associative map, where X1.X2.X3 is the key

  • later on, when you solve a problem P' = X1.X2.X3.X7.X9.X10.X11 you search for the longest prefix of P' in your map: you can do this by starting with P' and removing one component (Xi) at a time from the end until you find a match in your map or you end up with an empty string

  • if you find a prefix of P' in you map then you already know the solution for that problem, so you just have to find the solution for the problem resulting from combining the first element of the prefix with the suffix, and then compare the two results: in our example the prefix is X1.X2.X3, and so you just have to solve the problem for X1.X7.X9.X10.X11, and then compare the two values and choose the min (don't forget to update your map with the new pattern P')

  • if you don't find any prefix, then you must solve the entire problem for P' (and again don't forget to update the map with the result, so that you can reuse it in the future)

This technique is essentially a form of memoization.

share|improve this answer
Ok yes this works, but i'm wondering how I can reuse calculations efficiently. So for example the calculations done in X1.X2.X3, can be reused in X1.X2.X3.X4, X1.X2.X3.X4.X5 and so on. How should I store previous calculations so they're easy to re-access and use? – Projectile Fish Jun 3 '11 at 13:50
@Projectile Fish: I edited my answer to address your comment. – MarcoS Jun 3 '11 at 14:28
feedback

Your Answer

 
or
required, but never shown
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.