Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm a programming student in my first C++ class, and recently we were encouraged to write a simple recursive function to find the first occurrence of a substring in a given string. If found, it returns the index. If the substring is not found, the index_of() function should return -1. We are encouraged to use a helper function that takes the index as one of its parameters, and this is what I've tried.

For example:

int index_of("Mississippi", "sip"); // this would return a 6

This is supposed to be a simple exercise to help us understand recursion and won't be turned in. My professor stated that our actual assignment with recursion will be much more involved, which is why I really want to understand this simple use of recursion.

I've done this successfully using C-style strings and pointers, but not with C++ std::string objects. What am I doing wrong in my program? My professor stated we should easily be able to write this in 5 mins, but I've been struggling with it for two hours. Here's what I've done so far:

int index_of(string s, string t)
{
    int index = 0;

    if (s[index] == NULL)
        return -1;
    else if (starts_with(s, t, ++index))
    {
        return index;
    }
    else 
        return index;
}

bool starts_with(string s, string t, int index)
{
    if (t[index] == NULL)
        return true;
    if ( s[index] == NULL || t[0] != s[index])
        return false;
    return starts_with(s, t, ++index);
}

As written, this function always returns an index of 1.

share|improve this question
4  
Well done for putting in a lot of effort to try and solve this yourself before posting the question on here. Very refreshing! –  LeopardSkinPillBoxHat Feb 22 '10 at 3:57
    
index never gets updated in index_of after you incremented by ++. consider up returning index from starts_with rather than the flag –  Anycorn Feb 22 '10 at 3:58
3  
For those playing at home, this continues from stackoverflow.com/questions/2307146 –  glasnt Feb 22 '10 at 3:59
    
Just to be clear, this is not a turn-in assignment, I know I asked a question there before, but I really have been trying to understand this. I'm really struggling, I've been drawing pictures, I'm asking here as a last resort. –  Alex Feb 22 '10 at 4:02

5 Answers 5

up vote 7 down vote accepted
int index_of(string s, string t)
{
    int index = 0;

    if (s[index] == NULL)

Full stop. This isn't how C++'s strings work and you must fix this if you want to use them. Even with C-style strings, don't use NULL to mean the ASCII null character. They share a name but have different purposes, and you should not use NULL to mean integer zero (chars are integer types and the null character is their zero value). Use '\0' or just if (s[index]).

However, you aren't allowed to index a std::string unless you know the index is valid. To do that, compare the index against s.size() (and make sure it's greater than or equal to 0). Even so, what you are really testing here is if s is empty, and it has a special method to do that:

    if (s.empty())

Continuing:

    else if (starts_with(s, t, ++index))

Increment and decrement inside expressions, especially as here, can be confusing for the beginner with no advantage. The main advantage of them is code that is succinct and clear, but you have to already understand the main part of the code first, and even then experienced programmers sometimes benefit from being a tiny bit more verbose.

Anecdotally, Go's creators, who were also involved in early C history, even turned increment from an expression into a statement, and I believe clarity is a large part of the reason.


From the beginning

You want to implement a function with this signature:

int index_of(string haystack, string needle);
// returns the index of needle in haystack, if found
// otherwise returns -1

I include those comments with the signature on purpose: they are part of the public interface for this function. Better parameter names also increase clarity.

Identify the cases you need to consider:

  • needle is empty (you can handle this in multiple ways)
  • haystack is empty: return -1
  • at this point we know both haystack and needle are not empty
  • that leaves the two cases that are the crux of the algorithm:
    • first character of haystack does not match the first character of needle
    • there is a match of the first character

And when there is a match of the first character, you have two sub-cases:

  • there are no more characters in needle: match found
  • there are more characters: continue checking

I've written these as a recursive algorithm which receives "new copies" of each string (and substring) instead of using indices. However, you can transform to use indices by changing "first character" to "current character", and similarly for the "empty" conditions. You will want to use two indices in that case (and trying to only use one may have been a major stumbling block for you so far), unless you have a helping function to compare substrings (though I'm unsure if your professor had a separate intention with this comment).

A direct translation of the above prose into code:

int index_of(string haystack, string needle) {
  if (needle.empty()) return 0;
  // this implementation considers empty substrings to occur at the start of any
  // string, even an empty haystack; you could also make it an error to call
  // index_of when needle is empty, or just return -1

  if (haystack.empty()) return -1;

  assert(!needle.empty() && !haystack.empty()); // I wouldn't normally include
  // this, since we just checked these conditions, but this is the "at this
  // point we know both haystack and needle are not empty" that I mentioned

  if (haystack[0] != needle[0]) {
    // mark A, see below
    int index = index_of(haystack.substr(1), needle);
    return index != -1 ? index + 1 : index;
  }

  if (needle.length() == 1) return 0; // found complete match
  // note the way I chose to handle needle.empty() above makes this unnecessary

  // mark B, see below    
  // partial match (of the first character), continue matching
  int index = index_of(haystack.substr(1), needle.substr(1)); // strip first
  return index == 0 ? 0 : -1;
  // must check index == 0 exactly, if -1 then we must return that, and if not 0
  // then we've found a "broken" needle, which isn't a real match
}

The broken needle comment hints at how inefficient that code is, as it bifurcates the recursive calls into two categories: must match at 1 (which is 0 after slicing into substrings), at mark B, and can match anywhere, at mark A. We can improve this with a helper function, and I'll use std::string's operator== overload (operating on a substring of haystack) for that. This yields the recursive equivalent of the classical "naive strstr":

int index_of(string haystack, string needle) {
  if (needle.empty()) return 0;
  if (haystack.empty()) return -1;
  if (haystack.substr(0, needle.length()) == needle()) {
    return 0;
  }
  int index = index_of(haystack.substr(1), needle);
  if (index != -1) index++;
  return index;
}

And when using an index for haystack with string::compare as the helper so a needle index isn't required:

// might not be exposed publicly, but could be
int index_of(string const& haystack, int haystack_pos, string const& needle) {
  // would normally use string const& for all the string parameters in this
  // answer, but I've mostly stuck to the prototype you already have

  // shorter local name, keep parameter name the same for interface clarity
  int& h = haystack_pos;

  // preconditions:
  assert(0 <= h && h <= haystack.length());

  if (needle.empty()) return h;
  if (h == haystack.length()) return -1;
  if (haystack.compare(h, needle.length(), needle) == 0) {
    return h;
  }
  return index_of(haystack, h+1, needle);
}

int index_of(string haystack, string needle) {
  // sets up initial values or the "context" for the common case
  return index_of(haystack, 0, needle);
}

Notice this version is tail-recursive, but this is still a naive algorithm and more advanced ones exist.


If I had more time, I would have written a shorter letter.
    — Cicero

You've said this is helped you a lot, but, even with the additional examples I just included, it seems lacking to me. Substring-search is not a good recursion exercise, in my opinion, and that could be why.

share|improve this answer
    
wow, really great breakdown for me. Many thanks to you my friend! You helped me solve the problem while at the same time understand recursion. If only I could give you more rep... –  Alex Feb 22 '10 at 6:56
    
@Alex: Glad I could help. It's not about the rep, but thanks anyway. :) –  Roger Pate Feb 22 '10 at 7:12

Your code boils down to this

int index_of(string s, string t)
{
    int index = 0;

    //if (s[index] == NULL)
    //    return -1;
    ++index // from this: else if (starts_with(s, t, ++index))
    //{
    //    return index;
   // }
   //else 
        return index;
}

so, yes, it always returns 1

index continues to increment inside the recursive starts_with function, but the changed values never make it back out to be your return value.

share|improve this answer

You are not updating the index variable in the index_of() function. When you pass it to starts_with() it makes a copy on the stack. You need to return the updated value somehow - ether take it by reference/pointer or return it instead of bool.

share|improve this answer

I don't think 5 minutes was at all a reasonable estimate for an intro class student. To help you get even help you understand the issue, I've written what I think is the final answer...

#include <string>
#include <iostream>
using namespace std;

int starts_with(string s, string sub, unsigned int i) {
  if (i >= s.length())
    return -1;
  if (s.compare(i, sub.length(), sub) == 0)
    return i;
  return starts_with(s, sub, i + 1);
}
int index_of(string s, string sub) { return starts_with(s, sub, 0U); }
int main(void) { cout << index_of("Mississippi", "sip") << "\n"; }
share|improve this answer
    
Sigh, a drive-by downvote with no comment ... I don't need the points but it always makes me wonder why ... –  DigitalRoss Feb 22 '10 at 7:20
    
what does the OU stand for in your code? –  starcorn Feb 22 '10 at 8:22
    
Oh, the U suffix means unsigned –  DigitalRoss Feb 22 '10 at 9:01

(note - edited for simplicity)

Not sure if this'll help as I'm not sure to what extent recursion was explained to you by your teacher, but here goes...

This is how you need to think of it - A recursive function contains two main components:

1) a Base Case and

2) a Recursive Case

The key is that on each call, you are determining which of the two cases is true for this run of the function (based on the input parameter(s)).

The base case is where the function is not called again, and the recursive case always calls the function, and is usually more complicated in its logic than the base case.


So I'd suggest starting over. Think about what the input should be such that on a function call with that input the function does not call itself ---> that's your base case.

If on a function call it isn't the base case, it's the recursive case. So your function needs to look like this:

function index_of(params...){
    if base case
        do something simple and return
    else
        do something and call index_of again
}

Hint: In your function, there is no recursive case.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.