Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

For various reasons, I'm parsing a string, this code will explain what I'm after:

        string baseString = "This is a \"Very Long Test\"";

        string[] strings = baseString.Split(' ');

        List<String> stringList = new List<string>();
        string temp = String.Empty;
        foreach (var s in strings)
        {
            if (!String.IsNullOrWhiteSpace(temp))
            {
                if (s.EndsWith("\""))
                {
                    string item = temp + " " + s;
                    stringList.Add(item.Substring(1,item.Length - 2));
                    temp = string.Empty;
                }
                temp = temp + " " + s;
            }
            else if (s.StartsWith("\""))
            {
                temp = s;
            }
            else
            {
                stringList.Add(s);
            }

        }

        stringList.ForEach(Console.WriteLine);

The output should be:

This
is
a
Very Long Test

Basically, given a string, it will split it on spaces, unless its grouped into speech marks, the same way the command line does it.

Any better way to do this code?

share|improve this question
2  
It's usually a bad idea to concatenate strings in a loop. If performance matters to you, you should probably use StringBuilder instead. –  svick Apr 12 '12 at 19:06
    
What about something like "This is a \"Very Long\" Test\"" ? Is there a guarantee that the \" 's will come in pairs? other wise the problem is too ill defined for a proper solution –  Arjang Apr 13 '12 at 4:37

2 Answers 2

up vote 4 down vote accepted

Seems like a job for a regular expression:

string baseString = "This is a \"Very Long Test\"";
var re = new Regex("(?<=\")[^\"]*(?=\")|[^\" ]+");
var strings = re.Matches(baseString).Cast<Match>().Select(m => m.Value).ToArray();

What the regular expression (?<=")[^"]*(?=")|[^" ]+ does is that it either finds a sequence of zero or more characters that are not " ([^"]*) preceded by a " ((?<=")) and followed by a " ((?=")) or a sequence of one or more character that are not " or a space ([^" ]+).

For the sample input, it gives the same output as your version. The code itself is much simpler, but the regular expression might be hard to understand, especially if you're not used to them.

share|improve this answer
    
A header comment could easily resolve the hard to understand part? Other than that I don't think it's any harder to understand than the original Q. –  dreza Apr 12 '12 at 19:42
    
Looks cool. Could you elaborate on why you need (?<=) and (?=") as opposed to just (")[^"](")? –  Leonid Apr 13 '12 at 2:37
    
@Leonid, the difference is that if you did it your way, the quotation marks would be part of the match, so you would need to remove them later, for example by using Select(m => m.Value.Trim('"')). –  svick Apr 13 '12 at 6:21
    string baseString = "This is a \"Very Long Test\"... not so long actually, eh?";
    string[] aux = baseString.Split('"');
    List<string> tokens = new List<string>();
    for (int i = 0; i < aux.Length; ++i)
        if (i % 2 == 0)
            tokens.AddRange(aux[i].Split(' '));
        else
            tokens.Add(aux[i]);

Notice that if there is a double quote in the middle of a word it will be split (a"ctuall"y would be a, ctuall, y in the final result). If the last double quote is unmatch it won't split from its position to the end of the string.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.