3
\$\begingroup\$

Although there exists the most efficient "Knuth-Morris-Pratt" algorithm of string pattern matching. I tried to achieve the same result with a different approach. But I am not sure about it's efficiency. I am a novice in programming. Can anyone show me how to analyze this algorithm and find it's efficiency?

Algorithm:

Step 1: Calculate the integer value of the pattern by adding the integer values of each of the characters in it. Let it be "m"

Step2: Calculate the integer values of parts of the string, with the same length as that of the pattern in the similar way as described in step 1.

Step3 : Store these values in an array.

Step 4 : Compare the values in array with the "m" value from step1.

Step 5: If the values match then check if the part of the string matches with the pattern. If it does then return true. If the pattern does not match at all then return false.

I have coded the algorithm in Java. Here is the code.

public class StringPatternMatchingAlgorithm {
    public static boolean patternMatching(String str,String pattern){
        int patternLength = pattern.length();
        int patternSum = 0;
        int[] patternSumArray = new int[(str.length() - patternLength)+1];
        int patternSumArrayIndex = 0;
        int patternSumToCheck = 0;
        String toCompare = "";
        for(int i=0; i<pattern.length();i++){
            patternSumToCheck = patternSumToCheck + pattern.charAt(i);
            
        }
        for(int i=0; i<((str.length())-(pattern.length()-1));i++){
            patternSum = 0;
            for(int j=i; j<patternLength;j++){
               patternSum = patternSum + str.charAt(j);
            }
            patternLength++;
            patternSumArray[patternSumArrayIndex] = patternSum;
            if(patternSumArrayIndex < patternSumArray.length-1 ){
                patternSumArrayIndex++;
            }    
        }
        for(int i=0; i<patternSumArray.length;i++){
            if(patternSumArray[i] == patternSumToCheck){
                toCompare = "";
               for(int j=i; j<(i+pattern.length());j++){
                   toCompare = toCompare + str.charAt(j);
               }
               if(toCompare.equals(pattern)){
                  return true; 
               }
            }
        }
        return false;
    }
    public static void main(String[] args) {
        String str = "cdxabie";
        String pattern = "bie";
        System.out.println(patternMatching(str, pattern));
    }
 }

The code can be run here

New contributor
Deepeshkumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
\$\endgroup\$
3
\$\begingroup\$

A couple of comments on your answer:

Code style

  • If you've already stored the pattern.length(), why keep calling the method? Just be consistent and use that variable. Same goes for str.length(). I'd just store it and use it.

  • I would store str.length()-(pattern.length()-1 in a variable an name it something like numOfCasesToTest, to make it clearer.

  • Check parenthesis usage:

for(int i=0; i<((str.length())-(pattern.length()-1));i++){

Could be written as:

for(int i = 0; i < str.length() - pattern.length() - 1; i++){

Unnecessary parenthesis makes code harder to read. For further details see Operator precedence.

  • Keep your spacing consistent. While some people prefer not having spaces between some or most operators, I'd rather have them, as I think it makes the code more readable. Whichever way you prefer, just keep consistent usage in the code. For example:
i<patternSumArray.length // no space around comparison operator
patternSumArray[i] == patternSumToCheck // space around comparison operator

Algorithm

That was all code style related. Now, a comment on your algorithm: Why calculate all posible cases sum, and only after that look for a match? This way, you are iterating over the whole str whereas it might not be necessary.

I see how you may have reached that solution: you thought process was as described in the algorithm. But you could just ommit the part in which you store the sums, and compare with the match straight away.

For example, say you had a string such as

Abcsomelongstringwithlotsofconentinit

And you are looking for the pattern Abc. Then, why bother doing of all those calculations? Simply, after you calculate one of those sums, compare it with the pattern sum.

Still (maybe I'm being naive), but I do not see how calculating the sum, comparing it, and then actually comparing the charaters is better than just comparing the characters in the first place

\$\endgroup\$
3
  • 2
    \$\begingroup\$ "I do not see ..." - Because you don't compare the characters at all if the sum already doesn't match. Rabin-Karp is better than sum, though. \$\endgroup\$ – Kelly Bundy 7 hours ago
  • \$\begingroup\$ @m-alorda Thanks for the review, will definitely follow the instructions. \$\endgroup\$ – Deepeshkumar 3 hours ago
  • 1
    \$\begingroup\$ Using the sum as a simple kind of rolling hash could in principle be an improvement on the naive algorithm, except you are calculating the sum of each substring independently, in the naive way. For there to be a potential performance benefit, the sum should be calculated using a moving window approach where you add the new character on the right of the window and subtract the character on the left of the window in order to move the window along by one without needing a loop over the whole window to recompute the sum. \$\endgroup\$ – kaya3 2 hours ago

Your Answer

Deepeshkumar is a new contributor. Be nice, and check out our Code of Conduct.

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.