int index_of(string s, string t)
{
int index = 0;
if (s[index] == NULL)
Full stop. This isn't how C++'s strings work and you must fix this if you want to use them. Even with C-style strings, don't use NULL to mean the ASCII null character. They share a name but have different purposes, and you should not use NULL to mean integer zero (chars are integer types and the null character is their zero value). Use '\0'
or just if (s[index])
.
However, you aren't allowed to index a std::string unless you know the index is valid. To do that, compare the index against s.size()
(and make sure it's greater than or equal to 0). Even so, what you are really testing here is if s is empty, and it has a special method to do that:
if (s.empty())
Continuing:
else if (starts_with(s, t, ++index))
Increment and decrement inside expressions, especially as here, can be confusing for the beginner with no advantage. The main advantage of them is code that is succinct and clear, but you have to already understand the main part of the code first, and even then experienced programmers sometimes benefit from being a tiny bit more verbose.
Anecdotally, Go's creators, who were also involved in early C history, even turned increment from an expression into a statement, and I believe clarity is a large part of the reason.
From the beginning
You want to implement a function with this signature:
int index_of(string haystack, string needle);
// returns the index of needle in haystack, if found
// otherwise returns -1
I include those comments with the signature on purpose: they are part of the public interface for this function. Better parameter names also increase clarity.
Identify the cases you need to consider:
- needle is empty (you can handle this in multiple ways)
- haystack is empty: return -1
- at this point we know both haystack and needle are not empty
- that leaves the two cases that are the crux of the algorithm:
- first character of haystack does not match the first character of needle
- there is a match of the first character
And when there is a match of the first character, you have two sub-cases:
- there are no more characters in needle: match found
- there are more characters: continue checking
I've written these as a recursive algorithm which receives "new copies" of each string (and substring) instead of using indices. However, you can transform to use indices by changing "first character" to "current character", and similarly for the "empty" conditions. You will want to use two indices in that case (and trying to only use one may have been a major stumbling block for you so far), unless you have a helping function to compare substrings (though I'm unsure if your professor had a separate intention with this comment).
A direct translation of the above prose into code:
int index_of(string haystack, string needle) {
if (needle.empty()) return 0;
// this implementation considers empty substrings to occur at the start of any
// string, even an empty haystack; you could also make it an error to call
// index_of when needle is empty, or just return -1
if (haystack.empty()) return -1;
assert(!needle.empty() && !haystack.empty()); // I wouldn't normally include
// this, since we just checked these conditions, but this is the "at this
// point we know both haystack and needle are not empty" that I mentioned
if (haystack[0] != needle[0]) {
// mark A, see below
int index = index_of(haystack.substr(1), needle);
return index != -1 ? index + 1 : index;
}
if (needle.length() == 1) return 0; // found complete match
// note the way I chose to handle needle.empty() above makes this unnecessary
// mark B, see below
// partial match (of the first character), continue matching
int index = index_of(haystack.substr(1), needle.substr(1)); // strip first
return index == 0 ? 0 : -1;
// must check index == 0 exactly, if -1 then we must return that, and if not 0
// then we've found a "broken" needle, which isn't a real match
}
The broken needle comment hints at how inefficient that code is, as it bifurcates the recursive calls into two categories: must match at 1 (which is 0 after slicing into substrings), at mark B, and can match anywhere, at mark A. We can improve this with a helper function, and I'll use std::string's operator== overload (operating on a substring of haystack) for that. This yields the recursive equivalent of the classical "naive strstr":
int index_of(string haystack, string needle) {
if (needle.empty()) return 0;
if (haystack.empty()) return -1;
if (haystack.substr(0, needle.length()) == needle()) {
return 0;
}
int index = index_of(haystack.substr(1), needle);
if (index != -1) index++;
return index;
}
And when using an index for haystack with string::compare as the helper so a needle index isn't required:
// might not be exposed publicly, but could be
int index_of(string const& haystack, int haystack_pos, string const& needle) {
// would normally use string const& for all the string parameters in this
// answer, but I've mostly stuck to the prototype you already have
// shorter local name, keep parameter name the same for interface clarity
int& h = haystack_pos;
// preconditions:
assert(0 <= h && h <= haystack.length());
if (needle.empty()) return h;
if (h == haystack.length()) return -1;
if (haystack.compare(h, needle.length(), needle) == 0) {
return h;
}
return index_of(haystack, h+1, needle);
}
int index_of(string haystack, string needle) {
// sets up initial values or the "context" for the common case
return index_of(haystack, 0, needle);
}
Notice this version is tail-recursive, but this is still a naive algorithm and more advanced ones exist.
If I had more time, I would have written a shorter letter.
— Cicero
You've said this is helped you a lot, but, even with the additional examples I just included, it seems lacking to me. Substring-search is not a good recursion exercise, in my opinion, and that could be why.