Implement strStr()

Question

Implement strStr() function.

Return a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.

Do not use any system library functions such as strlen.

Answer 1

The strstr function locates a specified substring within a source string. Strstr returns a pointer to the first occurrence of the substring in the source. If the substring is not found, strstr returns a null pointer.

As expected, this question is very popular among technical interviews. This question tests your ability in manipulating strings using pointers, as well as your coding ability.

Of course, you can demonstrate to your interviewer that this problem can be solved using known efficient algorithms such as Rabin-Karp algorithm, KMP algorithm, and the Boyer-Moore algorithm. Since these algorithms are usually studied in advanced algorithms class, for an interview it is sufficient to solve it using the most direct method -- The brute force method.

For non-C programmers who are not familiar with C-style strings, your first reaction is to obtain the length of the string. But remember, use of system library is not allowed. If you are not familiar with pointer manipulation, it's time to brush up your pointer skills.

The brute force method is straightforward to implement. We scan the substring (the target) with the string from its first position and start matching all subsequent letters one by one. If one of the letter doesn't match, we start over again with the next position in the string. Assume that N = length of string, M = length of substring, then the complexity is O(N*M).

Think you've got it all in your head? Try writing the code down on a piece of paper. Remember to test for special cases.

char* StrStr(const char *str, const char *target) {
    if (!*target) return str;
    char *p1 = (char*)str;
    while (*p1) {
        char *p1Begin = p1, *p2 = (char*)target;
        while (*p1 && *p2 && *p1 == *p2) {
            p1++;
            p2++;
        }
        if (!*p2)
            return p1Begin;
        p1 = p1Begin + 1;
    }
    return NULL;
}

Did you handle the special case correctly? That is, when the target substring is an empty string. What should you return in this case? The correct implementation of strstr should return the full string in this case.

The above solution is usually good enough for an interview. But it turns out we are able to improve the code a little further. Note that the above code iterates at most N times in the outer loop.

The improvement is based on one observation: We only need to iterate at most N-M+1 times in the outer loop. Why? Because after looping more than N-M+1 times, the string has less than M characters to be matched with the substring. In this case, we know no more substring will be found in the string, and therefore saving us from additional comparisons (which might be substantial especially when M is large compared to N.)

How do we loop for only N-M+1 times? We are able to calculate the size of the string and substring by iterating a total of N+M times. In fact, finding just the length of the substring, M is sufficient. We use an additional pointer, p1Adv and advance it M-1 times ahead of the string pointer. Therefore, when p1Adv points to '\0', we know that it has iterated N-M+1 times.

char* StrStr(const char *str, const char *target) {
    if (!*target) return str;
    char *p1 = (char*)str, *p2 = (char*)target;
    char *p1Adv = (char*)str;
    while (*++p2)
        p1Adv++;
    while (*p1Adv) {
        char *p1Begin = p1;
        p2 = (char*)target;
        while (*p1 && *p2 && *p1 == *p2) {
            p1++;
            p2++;
        }
        if (!*p2)
            return p1Begin;
        p1 = p1Begin + 1;
        p1Adv++;
    }
    return NULL;
}

Answer 2

Here I am using the KMP algorithm. Many articles erroneously explain the Knuth–Morris–Pratt algorithm to be Morris–Pratt algorithm. It is worth to learn the difference :)

class Solution {
public:
    void compute_prefix(char* T, int m, vector<int>& pi) {
        int i = 0;
        int j = pi[0] = -1;
        while (i < m) {
            while (j > -1 && T[i] != T[j]) {
                j = pi[j];
            }
            ++i;
            ++j;
            pi[i] = (T[i] == T[j]) ? (pi[j]) : (j);
        }
    }

    char *strStr(char *haystack, char *needle) {
        int n = strlen(haystack);
        int m = strlen(needle);
        if (m == 0) {
            return haystack;
        }

        vector<int> pi(m+1);
        compute_prefix(needle, m, pi);

        int i = 0;
        int j = 0;
        while (i < n) {
            while (j > -1 && haystack[i] != needle[j]) {
                j = pi[j];
            }
            ++i;
            ++j;
            if (j >= m) {
                return haystack+i-j;
            }
        }
        return 0;
    }
};

Answer 3

class Solution {
public:
    char *strStr(char *haystack, char *needle) {
        // Start typing your C/C++ solution below
        // DO NOT write int main() function
        if (!haystack || !needle) return NULL;
        if (!*needle) return haystack;
        string const text(haystack);
        string const pattern(needle);
        size_t const pMax = pattern.size() - 1;
        for(size_t tailPos = pMax; tailPos < text.size();){
            size_t tPos = tailPos;
            int pPos = pMax;
            for(; pPos >= 0; --pPos, --tPos){
                if (text[tPos] != pattern[pPos]){
                    ++tailPos;
                    break;
                }
            }//for pPos
            if (pPos < 0) return (haystack + tailPos - pMax);
        }//for tPos
        return NULL;
    }
};

Answer 4

KMP search. kmp_table is to calculate the auxiliary table.

char *strStr(char *S, char *W) {
    assert(S!=NULL && W!=NULL);
    if (*W == '\0') return S;
    int Slen = strlen(S);
    int Wlen = strlen(W);

    vector<int> T(Wlen);
    kmp_table(W, T);

    int m = 0;
    int i = 0;
    while ((m+i) < Slen) {
        if (W[i] == S[m+i]) {
            if (i == (Wlen - 1)) return S+m;
            i++;
        } else {
            m = m + i - T[i];
            if (T[i] > -1)
                i = T[i];
            else
                i = 0;
        }
    }
    return NULL;
}
void kmp_table(char *W, vector<int> &T) 
{
    T[0] = -1;
    T[1] = 0;

    int pos = 2;
    int cnd = 0;
    int len = strlen(W);
    while (pos < len) {
        if (W[pos - 1] == W[cnd]) {
            T[pos++] = ++cnd;
        } else if (cnd > 0) {
            cnd = T[cnd];
        } else {
            T[pos++] = 0;
        }
    }
}

Answer 5

char *strStr(char *haystack, char *needle) {
    // Start typing your C/C++ solution below
    // DO NOT write int main() function
    int *next = NULL;
    constructNextTable(needle, next);

    int i=0, j=0;

    while(haystack[i] != '\0' && needle[j] != '\0') {
        if (haystack[i] == needle[j]) {
            i++;
            j++;
        }
        else {
            if (j==0) {
                i++;
            }
            else {
                j = next[j];
            }
        }

    }

    if(!next)
        delete next;
    if (needle[j] == '\0') {
        return &haystack[i-j];
    }

    return NULL;
}

void constructNextTable(char *str, int *&next) {

    int length = strLength(str);
    next = new int[length];

    int i=0, j=1;
    next[0] = 0;
    while(str[j] != '\0') {

        next[j] = i;
        if (str[i] == str[j]) {
            i++;
        }
        else {
            i = next[i];
        }
        j++;
    }
}

int strLength(char *str) {
    int length = 0;
    char *p = str;
    while(str[length] != '\0')
        length++;
    return length;
}

Answer 6

public class Solution {

public String strStr(String haystack, String needle) {
    if (needle.length() == 0) return haystack;
    if (haystack.length()<needle.length()) return null;
    for (int i = 0; i < haystack.length();i++){
        int j = i;
        // first judge if there is no enough length left to hit needle, very important to save time
        // second obvious
        // tracking needle in haystack
        while (needle.length()+i <= haystack.length()&& j<haystack.length() && haystack.charAt(j) == needle.charAt(j-i)){
            j++;
            if (j-i == needle.length()){
                return haystack.substring(i);
            }
        }

    }
    return null;

}

}

Answer 7

/*KMP algorithm, use number of chars of same prefix instead of index*/
void computePrefix(char *str, vector<int>& pi) {
    int n = strlen(str);
    pi[0] = 0;
    int k = 0;//number of current prefix char start from zero
    for (int i=1; i<n; i++) {
        while (k>0 && str[k] != str[i]) {
            k = pi[k-1];
        }
        if (str[k] == str[i])
            ++k;
        pi[i] = k;//pi stores number of prefix char on index of string i
    }
}

char *strStr(char *h, char *n) {
    if (!h || !n) return NULL;

    if (*n == '\0') {
        return h;
    }
    else if (*h == '\0') {
        return NULL;
    }

    int nlen = strlen(n), hlen = strlen(h);
    vector<int> pi(nlen);
    computePrefix(n, pi);
    int cur = 0;//currently matched number of chars
    for (int i = 0; i < hlen; i++) {
        while (cur > 0 && n[cur] != h[i]) {
            cur = pi[cur-1];
        }
        if(n[cur] == h[i])
            cur += 1;
        if(cur == nlen) return &h[i - nlen + 1];
    }
    return NULL;        
}

Answer 8

KMP algorithm:

int [] getNext(int p[],int n){
    int next[] = new int[n];
    next[0] = -1;
    int k = -1;int i = 0;
    while(i < n-1){
        if(k == -1 || p[i] == p[k]){k++;i++;next[i] = k;}
        else k = next[k];
    }
}
int kmp(int next[],char s[],char p[],int m,int n){
     int i = 0;
     int j = 0;
     while(i < m){
         if(j == -1 || p[j] == s[i]){i++;j++;}
         else j = next[j];
         if(j == n) return i-n+1;
     }
}

//hope there is no bug ~

Answer 9

I'd like to think my solution is kind of straightforward.

class Solution {
public:
    char *strStr(char *haystack, char *needle) {

        if(!haystack || !needle) return NULL;

        char *match = needle;

        if(*needle == '\0' ) return haystack;

        for(char *c = haystack, *bookmark = haystack; *c != '\0'; c++){
            if(*c == *match){
                if(match == needle)
                    bookmark = c;
                match++;
            } 
            else if (match>needle){
                match = needle;
                c = bookmark;
            }

            if(*match=='\0')
                return c - strlen(needle) + 1;
        }

        return NULL;
    }
};

Answer 10

I doubt there is something wrong with the judge program of this problem I get wrong answer feedback attached below:

Input: "mississippi", "issi" Output: "ississi" Expected: "ississippi"

Anybody have any idea about this?

Implement strStr()

Follow this question

Welcome to LeetCode

Related questions