A different approach to string pattern matching algorithm

Question

Although there exists the most efficient "Knuth-Morris-Pratt" algorithm of string pattern matching. I tried to achieve the same result with a different approach. But I am not sure about it's efficiency. I am a novice in programming. Can anyone show me how to analyze this algorithm and find it's efficiency?

Algorithm:
Step 1: Calculate the integer value of the pattern by adding the integer values of each of the characters in it. Let it be "m"
Step2: Calculate the integer values of parts of the string, with the same length as that of the pattern in the similar way as described in step 1.
Step3 : Store these values in an array.
Step 4 : Compare the values in array with the "m" value from step1.
Step 5: If the values match then check if the part of the string matches with the pattern. If it does then return true. If the pattern does not match at all then return false.

I have coded the algorithm in Java. Here is the code.

public class StringPatternMatchingAlgorithm {
    public static boolean patternMatching(String str,String pattern){
        int patternLength = pattern.length();
        int patternSum = 0;
        int[] patternSumArray = new int[(str.length() - patternLength)+1];
        int patternSumArrayIndex = 0;
        int patternSumToCheck = 0;
        String toCompare = "";
        for(int i=0; i<pattern.length();i++){
            patternSumToCheck = patternSumToCheck + pattern.charAt(i);
            
        }
        for(int i=0; i<((str.length())-(pattern.length()-1));i++){
            patternSum = 0;
            for(int j=i; j<patternLength;j++){
               patternSum = patternSum + str.charAt(j);
            }
            patternLength++;
            patternSumArray[patternSumArrayIndex] = patternSum;
            if(patternSumArrayIndex < patternSumArray.length-1 ){
                patternSumArrayIndex++;
            }    
        }
        for(int i=0; i<patternSumArray.length;i++){
            if(patternSumArray[i] == patternSumToCheck){
                toCompare = "";
               for(int j=i; j<(i+pattern.length());j++){
                   toCompare = toCompare + str.charAt(j);
               }
               if(toCompare.equals(pattern)){
                  return true; 
               }
            }
        }
        return false;
    }
    public static void main(String[] args) {
        String str = "cdxabie";
        String pattern = "bie";
        System.out.println(patternMatching(str, pattern));
    }
 }

The code can be run here

m-alorda · Accepted Answer · 2021-07-31 16:34:23Z

A couple of comments on your answer:

Code style

If you've already stored the pattern.length(), why keep calling the method? Just be consistent and use that variable. Same goes for str.length(). I'd just store it and use it.
I would store str.length()-(pattern.length()-1 in a variable an name it something like numOfCasesToTest, to make it clearer.
Check parenthesis usage:

for(int i=0; i<((str.length())-(pattern.length()-1));i++){

Could be written as:

for(int i = 0; i < str.length() - pattern.length() - 1; i++){

Unnecessary parenthesis makes code harder to read. For further details see Operator precedence.

Keep your spacing consistent. While some people prefer not having spaces between some or most operators, I'd rather have them, as I think it makes the code more readable. Whichever way you prefer, just keep consistent usage in the code. For example:

i<patternSumArray.length // no space around comparison operator
patternSumArray[i] == patternSumToCheck // space around comparison operator

Algorithm

That was all code style related. Now, a comment on your algorithm: Why calculate all posible cases sum, and only after that look for a match? This way, you are iterating over the whole str whereas it might not be necessary.

I see how you may have reached that solution: you thought process was as described in the algorithm. But you could just ommit the part in which you store the sums, and compare with the match straight away.

For example, say you had a string such as

Abcsomelongstringwithlotsofconentinit

And you are looking for the pattern Abc. Then, why bother doing of all those calculations? Simply, after you calculate one of those sums, compare it with the pattern sum.

Still (maybe I'm being naive), but I do not see how calculating the sum, comparing it, and then actually comparing the charaters is better than just comparing the characters in the first place

"I do not see ..." - Because you don't compare the characters at all if the sum already doesn't match. Rabin-Karp is better than sum, though. — Kelly Bundy, 7 hours ago
@m-alorda Thanks for the review, will definitely follow the instructions. — Deepeshkumar, 3 hours ago
Using the sum as a simple kind of rolling hash could in principle be an improvement on the naive algorithm, except you are calculating the sum of each substring independently, in the naive way. For there to be a potential performance benefit, the sum should be calculated using a moving window approach where you add the new character on the right of the window and subtract the character on the left of the window in order to move the window along by one without needing a loop over the whole window to recompute the sum. — kaya3, 2 hours ago

Stack Exchange Network

current community

your communities

more stack exchange communities

A different approach to string pattern matching algorithm

1 Answer 1

Code style

Algorithm

Your Answer

Not the answer you're looking for? Browse other questions tagged java algorithm strings pattern-matching or ask your own question.

Hot Network Questions

A different approach to string pattern matching algorithm

1 Answer 1

Code style

Algorithm

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java algorithm strings pattern-matching or ask your own question.

Related

Hot Network Questions