Sign up ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am trying to implement substring in C. How efficient and standard is this code? I have a hard time understanding how C functions are usually meant to work, like how printf() returns an int or size_t.

size_t substring(char *destination, char const *string,  int start, int len) {
    // writes a substring to the destinaton starting at start index of string until end stepping by step
    // follows the same start at 0 off by one on the end format as Java substring method
    int substringLength = len;

    int stringLength = strlen(string);

    if (start > stringLength || len > stringLength || start < 0 || len < 0) {
        fputs("start and len must be 0 < start/len < length of string", stderr);
        return -1;
    }
    if (start >= len) {
        fputs("start must be start < len", stderr);
        return -1;
    }

    memcpy(destination, &string[start], substringLength);

    if (destination[stringLength] != '\x0') {
        destination[stringLength] = '\x0';
    }
    return substringLength;
}
share|improve this question
1  
It would be good to note that you could just use the standard memmove() or memcpy() to accomplish this. – syb0rg yesterday
    
No this is C... – Tom yesterday
    
I fail to see your point? – syb0rg 18 hours ago
    
Someone asked it was c++ because of the double slash comments which weren't allowed in standard c for some time so I said no it was c – Tom 17 hours ago
    
Ahh, my bad I thought that was a response to my comment haha. They were referring to the old C89 standard when they thought of that, but // were introduced in C99. – syb0rg 17 hours ago

5 Answers 5

Argument confusion

If you were trying to replicate the java substring function, you should have had a start and end argument. As it is, you have a start and length argument, but you didn't handle the length argument correctly.

Bug 1

This line doesn't correctly check for the case of copying off the end of the source string:

if (start > stringLength || len > stringLength || start < 0 || len < 0) {

It needs to be:

if (start + len > stringLength || start < 0 || len < 0) {

Edit: If you want to check for integer overflow, you can also throw in || start + len < 0, and use size_t everywhere instead of int.

Bug 2

This check is unnecessary and can return an error incorrectly:

if (start >= len) {
    fputs("start must be start < len", stderr);
    return -1;
}

For example, if the string has 10 bytes, start is 8 and len is 1, this check will return an error.

Bug 3

When you terminate the destination string, you use the wrong index. It should be len, not stringLength. Also, I'm not sure why you do a check before you write to it:

if (destination[stringLength] != '\x0') {
    destination[stringLength] = '\x0';
}

should be:

destination[len] = '\0';

Also, '\0' is the standard way to write the null character. I've never seen it written as '\x0' before.

Unnecessary local variable

The variable substringLength is unnecessary because it is always the same as len. You can just use len instead.

Return value

The java substring function returns the string. Your function returns the length but I don't think that this will be useful to the caller. The reason you might want to return a string is so you can chain calls like this:

strcat(substring(buf, path, dir_start, dir_length), basename);
share|improve this answer
    
Note: start + len > stringLength is not safe against integer overflow. – immibis yesterday
    
On the other hand, if trying to emulate the .NET Substring(), then passing the length would be right. – 200_success 47 mins ago

I have a hard time understanding how C functions are usually meant to work

The best thing to do is to read the documentation, which is only a Google search away.

For instance, you'll determine that strlen() returns a size_t, so stringLength should also be a size_t, same with len. Mixing this with an int or other signed type can cause "possible loss of data" warnings.

Also, if you have a string called destination, then the other one should be called source.

share|improve this answer

char *destination needs a size

Any new string function can be made far more robust by passing in the size of the destination array. The minor performance saving by not having to do size limitations is far outweighed by the grief cause by message overruns.

Use size_t for all string sizes and indexes. Type int may be of insufficient range especially for this function whose goal is to make sub-strings of larger strings.

Avoid an argument name like string. Use a name that describes its role.

// size_t substring(char *destination, char const *string,  int start, int len) {
size_t substring(char *destination, size_t dsize, char const *source,  
    size_t start, size_t len) {

Investigate restrict. Your present code has trouble if destination and string overlap. Either adjust code to handle that or use restrict to indicate your function can not cope with overlap.

Define corner cases

Instead of returning (size_t)-1, on "start and len must be 0 < start/len < length of string", define a plausible functionality.

Along with using size_t dsize, there should be no error conditions possible. All possible legitimate inputs should result in a defined result.

// dest and source may overlap
void substring1(char *dest, size_t dsize, const char *source, size_t start, size_t length) {
  size_t source_len = strlen(source);
  if (start > source_len) start = source_len;
  if (start + length*1ull > source_len) length = source_len - start;
  if (length + 1 > dsize) {
    if (dsize == 0) {
      return;
    }
    length = dsize - 1;
  }
  memmove(dest, &source[start], length);
  dest[length] = 0;
}

Ref: Sample implementations

share|improve this answer

By first glance it looks kind of OK, but there are at least a few bugs:

  • Given a start > 0 and len < strlen(string), but still large, then your memcpy might end up writing beyond allocated memory
  • Why the start < len test? Given start = 5 and len = 2, but string = 'A somewhat long string'. It should be allowed to copy 2 characters from that string. I think what you mean to check for is something like (start + len) <= stringLength

You should also be aware that C also have substring functions in default libraries, like strncpy or strcpy, so you're kind of reinventing the wheel.

share|improve this answer

Printing an error message to stderr and returning -1 if argument validation fails isn't very good practice. For one thing, there's no way to distinguish different error cases; for another, there's no way to avoid the log spam.

If an error you detect is always programmer error then you should catch it with an assertion that stops the program dead in its tracks (or better yet, a static assertion that stops the program from compiling). Otherwise, if the error could be caused by user input, you should return an error code (preferably a different error code for each case) and let your caller decide how the error should be handled. Printing a canned error to the screen may not be what the user of your function wants!

Finally, you have the opportunity to decide that some cases aren't errors at all. It can be convenient, and reduce the number of special cases in calling code, to allow the boundaries of a substring to be outside of the source string, or to allow the end position to be less than the start position, and to just trim the output string apprirately, and many standard substring functions allow some or all of these combinations.

For example, you might use the substring function to get the "first 80 characters of headline" to display a summary, but it doesn't need to be an error if the headline is less than 80 characters to begin with; the summary in that case would just be the whole headline. You might also get the rest of the headline by asking for a substring beginning at character 80, for a short headline this can just be the empty string instead of returning an error. This preserves the property that headline = summary + remainder. There are tradeoffs involved in stronger and weaker validation, but my point is that if you can come up with meaningful behavior for a set of arguments, then your function might be more useful by allowing it, rather than being needlessly restrictive.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.