Monday, April 7, 2008

Diff Algorithm with LINQ

A year or so ago I wrote a diff application similar to windiff based upon the core algorithm discussed in A Fast Diff Algorithm in Visual Basic .Net. Somewhere along the way of upgrading my computer I lost that application. I decided to go back and recreate that application, but this time focus on using LINQ as much as possible in the implementation. It is probably not the fastest, but it is intended to be an exercise in incorporating LINQ in the implementation of an algorithm.

What I have at this point is code that will find the longest common sequence (LCS) of two string arrays. An extension method, LCS, is available on any string collections that implements the IEnumerable interface. You start with an original string collection, and pass the LCS function another string collection that you would like to diff against the original. What you are returned is a DiffInfo class which contains the start index of the LCS in the original string collection, the start position of the LCS in the comparing string collection, and finally the length of the LCS. With that information we have a foundation on which to build an application to show the diff of the two string collections. I will build that application in later posts.

Example:


string[] org = {"A","A","z", "A"};
string[] cmp = {"z", "A", "A", "A", "A"};
DiffInfo d = org.LCS(cmp);
//index = 0
Console.WriteLine("LCS start in orginal " + d.OriginalStartPos.ToString());
//index = 1
Console.WriteLine("LCS start in compared " + d.ComparedStartPos.ToString());
// length = 2
Console.WriteLine("Length of LCS " + d.Length);


Code available here

1 comments:

Anonymous said...

thanks for sharing this site. you can download lots of ebooks from here

http://feboook.blogspot.com