Efficiency/design of trim() library code

Question

I am working on some library code and I am trying to optimize my trim() functions. To that effect I am trying to figure out how best to deal with each kind of input. Move semantics are starting to fry my brains a little. Sometimes I think I understand them... but not very often.

What I want to do with this piece of code is take advice in all areas:

Nomenclature:

I am open to suggestions for function names. For example do I name them after the type of parameter or the type of value (trim_cref vs trim_rval)? What do people prefer?

Unified function name based on overloads:

Also it would be nice to end up with a universal set of functions that select the most efficient method based on input parameter without having a suffix to select between them:

std::string trim(std::string s);
std::string& trim(std::string& s); // mutating
std::string trim(std::string&& s);
std::string trim(const std::string& s);

I am pretty sure that's not completely achievable due to ambiguity. The question is what version should I leave out? The copy or the mutating?

Efficiency of basic algorithm:

I am using this basic algorithm:

std::string& trim(std::string& s, const char* t = ws)
{
    s.erase(0, s.find_first_not_of(t));
    s.erase(s.find_last_not_of(t) + 1);
    return s;
}

I would like to think the library is smart enough to special-case s.erase() when the first parameter is zero or the last parameter is std::string::npos. If not is there a faster way to do this?

Efficiency of calling:

I have implemented these functions in terms of each other, but I am not sure (at all) if what I have done is most efficient. Or even if I am not reducing efficiency (by inhibiting RVO for example).

Am I even on the right track?

Do I even need 4 functions (or overloads)? Would it be just as efficient to simply take parameters as copies and rely on RVO?

const char* const ws = " \t\n\r\f\v";

inline std::string& ltrim_mute(std::string& s, const char* t);
inline std::string ltrim_move(std::string&& s, const char* t);
inline std::string ltrim_cref(const std::string& s, const char* t);
inline std::string ltrim_copy(std::string s, const char* t);

inline std::string& rtrim_mute(std::string& s, const char* t);
inline std::string rtrim_move(std::string&& s, const char* t);
inline std::string rtrim_cref(const std::string& s, const char* t);
inline std::string rtrim_copy(std::string s, const char* t);

inline std::string& trim_mute(std::string& s, const char* t);
inline std::string trim_move(std::string&& s, const char* t);
inline std::string trim_cref(const std::string& s, const char* t);
inline std::string trim_copy(std::string s, const char* t);

// LEFT

inline std::string& ltrim_mute(std::string& s, const char* t = ws)
{
    s.erase(0, s.find_first_not_of(t));
    return s;
}

inline std::string ltrim_move(std::string&& s, const char* t = ws)
{
    return std::move(ltrim_mute(s, t));
}

inline std::string ltrim_cref(const std::string& s, const char* t = ws)
{
    return std::move(ltrim_move(std::string(s),t));
}

inline std::string ltrim_copy(std::string s, const char* t = ws)
{
    return std::move(ltrim_move(std::move(s),t));
}

// RIGHT

inline std::string& rtrim_mute(std::string& s, const char* t = ws)
{
    s.erase(s.find_last_not_of(t) + 1);
    return s;
}

inline std::string rtrim_move(std::string&& s, const char* t = ws)
{
    return std::move(rtrim_mute(s, t));
}

inline std::string rtrim_cref(const std::string& s, const char* t = ws)
{
    return std::move(rtrim_copy(s, t));
}

inline std::string rtrim_copy(std::string s, const char* t = ws)
{
    return std::move(rtrim_move(std::move(s),t));
}

// BOTH

inline std::string& trim_mute(std::string& s, const char* t = ws)
{
    return rtrim_mute(ltrim_mute(s, t), t);
}

inline std::string trim_move(std::string&& s, const char* t = ws)
{
    return std::move(rtrim_mute(ltrim_mute(s, t), t));
}

inline std::string trim_cref(const std::string& s, const char* t = ws)
{
    return std::move(rtrim_move(ltrim_cref(s, t), t));
}

inline std::string trim_copy(std::string s, const char* t = ws)
{
    return std::move(rtrim_mute(ltrim_mute(s, t), t));
}

godlygeek · Answer 1 · 2014-08-06 17:21:59Z

I'll take a swing at this:

2) Unified function name based on overloads: ... Also it would be nice to end up with a universal set of functions that select the most efficient method based on input parameter without having a suffix to select between them: ... I am pretty sure that's not completely achievable due to ambiguity. The question is what version should I leave out? The copy or the mutating?

I think that it would be a very bad idea to have 2 overloads of the same function name where one mutates its argument and the other returns a copy. Overloads that don't perform the same semantic operation shouldn't be overloads, at least in my mind. So, I'd pick 2 different names instead, say "inplace_trim" and "trim". If you take that approach, then it would make perfect sense to have overloaded versions of "trim" for by-value, by-reference, by-const-reference, and by-xvalue.

This also solves:

1) Nomenclature:

I am open to suggestions for function names. For example do I name them after the type of parameter or the type of value (trim_cref vs trim_rval)?

This way you only need to have 2 names, "inplace_trim" and "trim". You don't need any funky naming convention since it allows overloading to work.

3) Efficiency of basic algorithm:

I have only one thought here, and it's a minor one: It will be slightly more efficient to trim the trailing whitespace first, then the leading. The reason is that eraseing from the front of a string is just like eraseing from the front of a vector; it requires copying every character after the erasure point into its new location in the sequence. Erasing from the end just updates pointers, so if you do the erase from the end first, there are fewer characters needing copying during the erase from the beginning.

Efficiency of calling:

It has just occurred to me that you're doing more copying than necessary in the case where there are actually leading spaces in your strings. The way that you implement ltrim_cref, for instance, you copy the input string and pass the copy into ltrim_move, which then deletes the leading whitespace by shifting every character after the erasure point left (another hidden copy). You could avoid this extra copy by instead doing (untested):

inline std::string ltrim_cref(const std::string& s, const char* t = ws)
{
    return s.substr(std::min(s.size(), s.find_first_not_of(t)));
}

So it seems like, by trying to implement these functions in terms of each other, you may be missing out on some optimization opportunities.

Other thoughts:

I'd rename t to chars_to_skip or something like that - t isn't a particularly meaningful name.
If ws is going to be exposed as part of your API through your header file, I'd make sure that a) it's in a namespace to avoid namespace pollution, and b) it gets a more descriptive name, like StringTrimming::ASCII_WHITESPACE.

Great points. Funny thing is I was originally using only two function names (one for copy, one for mutate) before I decided to try to get clever about it in order to try to take advantage of rvalue references. You are completely right about not mixing semantics. — Galik, yesterday
@Galik, I just added some points about "Efficiency of calling" that I just thought of. Also, on a similar line of thinking, depending on how much you care about efficiency you may want to consider using Boost's string_ref here - that would let you return something string-like, without ever needing to pay the cost for a copy under any circumstance (you'd have to pay the managerial overhead of ensuring the original string being referenced stays valid, instead). — godlygeek, yesterday
+1 "I'd rename t to chars_to_skip or something like that - t isn't a particularly meaningful name." "If ws is going to be exposed as part of your API through your header file, I'd make sure that a) it's in a namespace to avoid namespace pollution, and b) it gets a more descriptive name, like StringTrimming::ASCII_WHITESPACE." Meaningful names are everything. — Pharap, yesterday

asked	yesterday
viewed	303 times
active	yesterday

current community

your communities

more stack exchange communities

Efficiency/design of trim() library code

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged c++ performance strings c++11 library or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Efficiency/design of trim() library code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c++ performance strings c++11 library or ask your own question.

Related

Hot Network Questions