I just had the idea to develop an algorithm to calculate and highlight the difference between two strings. I know that there are already some libraries to do the job but i just tried to make my own. I really don't know whether this one is similar to the existing ones but it seems to work fine. As of now it's still in it's early stage which means depending on the situation it sometimes produce some multiple consecutive deletion and insertion spans which probably require a second pass to consolidate them under two single delete and insert spans.
How it works;
It takes two strings like
- "the quick brown fox jumps over the lazy dog"
and
- "the quick brown coyote jumps over the lazy dog"
It will create match a string up until it meets the first mismatching character in-between the two. The index of this character is designated by base index (bi
). So in this case matchStr
is "the quick brown " then it will generate two new strings as longer
("coyote jumps over the lazy dog") and shorter
("fox jumps over the lazy dog"). Now the Array.prototype.rotate()
generic method, which i had implemented a while back for another project walks in. Array.prototype.rotate()
can rotate the array in both directions but in this particular case we will only rotate it in one direction. shorter
stays static and longer
gets rotated to find the longest overlapping sub-string. So it goes like this;
XX: "fox jumps over the lazy dog"
00: "coyote jumps over the lazy dog"
01: "oyote jumps over the lazy dogc"
02: "yote jumps over the lazy dogco"
03: "ote jumps over the lazy dogcoy"
04: "te jumps over the lazy dogcoyo"
05: "e jumps over the lazy dogcoyot"
06: " jumps over the lazy dogcoyote"
07: "jumps over the lazy dogcoyote "
08: "umps over the lazy dogcoyote j"
09: "mps over the lazy dogcoyote ju"
10: "ps over the lazy dogcoyote jum"
11: "s over the lazy dogcoyote jump"
12: " over the lazy dogcoyote jumps"
13: "over the lazy dogcoyote jumps "
14: "ver the lazy dogcoyote jumps o"
15: "er the lazy dogcoyote jumps ov"
16: "r the lazy dogcoyote jumps ove"
17: " the lazy dogcoyote jumps over"
18: "the lazy dogcoyote jumps over "
19: "he lazy dogcoyote jumps over t"
20: "e lazy dogcoyote jumps over th"
21: " lazy dogcoyote jumps over the"
22: "lazy dogcoyote jumps over the "
23: "azy dogcoyote jumps over the l"
24: "zy dogcoyote jumps over the la"
25: "y dogcoyote jumps over the laz"
26: " dogcoyote jumps over the lazy"
27: "ogcoyote jumps over the lazy d"
28: "gcoyote jumps over the lazy do"
00: "coyote jumps over the lazy dog"
So we will collect all matched sub-strings for each single character rotated version of longer
for it's length many times (one complete turn). Once we have length many matched sub-strings, among which we will chose the longest one. So as you will notice at rotate count (rc
) #03 both strings produce the longest matched sub-string. Which is " jumps over the lazy dog"
So now we know that in the shorter
string (which doesn't rotate) the mismatching characters (mismatching sub-string) are the ones at indices 0, 1 and 2. So the start of match is index 3 (cd.fis
). While for the longer
it (cd.fil
) can be calculated as start of match of shorter (cd.fis
) + rotate count (rc
). However there may be cases in which the longer sentence has a shorter mismatching sub-string and in that case (think about it) cd.fil
would be cd.fis
+ len
- rc
where len
is the length of longer
.
So now that we have the matching sub string at the head, the two mismatching strings and the remaining matching string (up until the next mismatch) it's just concatenating the string into one with necessary HTML mark up. Then we recursively feed the function up until we reach to the end of one of the strings.
I totally don't know if this algorithm is a reasonable approach for this job. I tried some edge cases where it seems to work fine but I might be wrong and it might turn out to be inefficient. What would be your ideas to speed it up..?
The code;
Array.prototype.rotate = function(n) {
var len = this.length;
return !(n % len) ? this
: n > 0 ? this.map((e,i,a) => a[(i + n) % len])
: this.map((e,i,a) => a[(len - (len - i - n) % len) % len]);
};
String.prototype.diff = function(s){
var getBaseIndex = function(s,l){ // returns the index of first mismatching character
var i = 0;
while(s[i] === l[i] && i < s.length) ++i;
return i;
},
findFirstChange = function(s,l){ // returns the first matching substring after base index
var fi = len,
substr = "",
match = false,
i = 0;
while (!match && i < s.length) {
s[i] !== l[i] ? ++i : match = !match;
}
match && (fi = i); // match starts from this index
while (match && i < s.length) {
s[i] === l[i] ? substr += s[i++] : match = !match;
}
return {bix: bi, // base index : index of first mismaching character
fis: fi, // index of next re match in shorter string
fil: fi, // index of next re match in longer string (will be adjusted later)
fss: substr // next matching substring after first mismatch
};
},
isThisLonger = true; // true if the string designated by "this" is longer
bi = getBaseIndex(this,s),
matchStr = s.slice(0,bi), // the matching portion at the beginning
long = this.length >= s.length ? (isThisLonger = true, [...this].slice(bi)) // converted to array as the
: (isThisLonger = false, [...s].slice(bi)), // long string gets rotated
short = isThisLonger ? s.slice(bi) : this.slice(bi),
len = long.length,
substrings = [],
cd = {}, // change data !! important
change = [], // holds deleted and inserted substrings at indices 0 and 1
nextThis = "", // remaining part of old string to feed recursive call
nextS = "", // remaining part of new string to feed recursive call
result = ""; // the glorious result
for (var rc = 0; rc < len; rc++){ // rc -> rotate count
cd = findFirstChange(short,long.rotate(rc)); // collect change indices
cd.fil = rc < len - cd.fis ? cd.fil + rc : cd.fis + len - rc; // adjusted for index of next re match in longer string
substrings.push(cd);
}
cd = !!substrings.length && substrings.sort((a,b) => b.fss.length - a.fss.length || a.fis - b.fis || b.fil - a.fil )[0];
long = long.join("");
if (cd) {
change = isThisLonger ? [long.slice(0,cd.fil), short.slice(0,cd.fis)]
: [short.slice(0,cd.fis), long.slice(0,cd.fil)];
nextThis = isThisLonger ? long.slice(cd.fil) : short.slice(cd.fis);
nextS = isThisLonger ? short.slice(cd.fis) : long.slice(cd.fil);
change[0] = change[0] && ('<span class = "deleted">' + change[0] + '</span>');
change[1] = change[1] && ('<span class = "inserted">' + change[1] + '</span>');
result = matchStr + change[0] + change[1];
} else result = this;
result += (nextThis !== "" || nextS !== "") ? nextThis.diff(nextS) : "";
return result;
};
textOld.oninput = function(e){textNew.innerText = this.value};
textNew.onfocus = function(e){this.select()};
myButton.onclick = function(e){textdiff.innerHTML = textOld.value.diff(textNew.value)}
.deleted {background-color : LightPink;
text-decoration : line-through}
.inserted {background-color : PaleGreen}
<div>
<textarea id="textOld" placeholder="Please type something here" rows = "4" cols = "25"></textarea>
<textarea id="textNew" placeholder="Please edit the previous text here" rows = "4" cols = "25"></textarea>
<button id = myButton style = "display:block"> Click to get diff</button>
<p id="textdiff"></p>
</div>