Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I do not understand the output of this code:

public class StringDemo{              
    public static void main(String args[]) {
        String blank = "";                    
        String comma = ",";                   
        System.out.println("Output1: "+blank.split(",").length);  
        System.out.println("Output2: "+comma.split(",").length);  
    }
}

And got the following output:

Output1: 1 
Output2: 0
share|improve this question
1  
What do you not understand about it? –  Raedwald 8 hours ago
5  
@Raedwald Confusing part was that ",".split(",") could return ["",""] array but it returns [] (empty array - length 0 - because split(",",0) trails empty Strings at the end). So why empty string in result array was not trailed in case of "",split(",")? –  Pshemo 8 hours ago
    
+1 for putting attention to that annoying Java WTF. Such anti-feature makes it impossible to reasonably process CSV files using String.split –  Lukasz 8 hours ago
5  
So what? Parsing CSVs with String.split() is wrong (fields can contain embedded commas). –  Andrew Medico 7 hours ago
8  
@AndrewMedico No, since there's no official standard for CSVs, a valid definition for them could be that you get the columns simply by splitting the lines on commas. You can't say any definition is outright "wrong," as there is no "right." –  ahruss 6 hours ago

9 Answers 9

Documentation:

For: System.out.println("Output1: "+blank.split(",").length);

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

It will simply return the entire string that's why it returns 1.


For the second case, String.split will discard the , so the result will be empty.

String.split silently discards trailing separators

see guava StringsExplained too

share|improve this answer
2  
+1 but this explains only first result. Now it is time for second part (trailing empty strings). –  Pshemo 9 hours ago
    
Simply split ignores the delimiter when does the split so the result is empty –  Marco Acierno 9 hours ago
10  
The Javadoc of the one-argument split method says: "This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array." That's the correct explanation of the second result. Two trailing empty strings get excluded. –  COME FROM 8 hours ago
3  
Yeah, in theory everything is in doc. But I always wonder from where they are getting those guys that you can read 10 times what they've written, and yet still you have to write a test program to understand what that method is actually doing... –  Lukasz 8 hours ago

Everything happens according to plan, but lets do it step by step (hope you have some time).

According to documentation (and source code) of split(String regex) method

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero.

So when you invoke

split(String regex)

you are actually getting result from split(String regex, int limit) method which is invoked in a way

split(regex, 0)

So here limit is set to 0.

You need to know few things about this parameter:

  • If limit is positive you are limiting length of result array to positive number you specified so "axaxaxaxa".split("x",2) will return array ["a", "axaxaxa"], not ["a","a","a","a","a"]
  • If limit is 0 then you are not limiting length of result array. But it also means that any trailing empty strings will be removed. For example:

    "fooXbarX".split("X")
    

    will at start generate array which will look like

    ["foo", "bar", ""]
    

    ("barX" split on "X" generates "bar" and ""), but since split removes all trailing empty string it will return

    ["foo", "bar"]
    
  • Behaviour of negative value of limit is similar to behaviour where limit is set to 0 (it will not limit length of result array). Only difference is that it will not remove empty strings from end of result array. In other words

    "fooXbarX".split("X",-1)
    

    will return ["foo", "bar", ""]


I will try to answer at first case

",".split(",").length

which as explained earlier is effectively same as

",".split(",", 0).length

This means that we are using version of split which will not limit length of result array, BUT will remove all trailing empty strings "". You need to understand that when we split ONE thing we are always getting TWO thing.
In other words if we split "abc" in place of b we will get "a" and "c".
Tricky part is to understand that if we split "abc" in c we will get "ab" and""` (empty String).

Using this logic if we split "," on , we will get "" and "" (two empty strings).

You can check it easily using

for (String s: ",".split(",", -1))
    System.out.println("\""+s+"\"");

which will produce

""
""

so our result array contains ["", ""]

But we set limit to 0, so we decided that all trailing empty strings will be removed. In this case result array contains only trailing empty strings so all of them will be removed leaving you with empty array [] which length is 0.


To answer case

"".split(",").length

you need to understand that removing trailing empty strings has sense only if such trailing empty strings appeared as result of splitting and are unwanted. So if there ware not any places on which we could split there is no point in running this "cleaning" process. That is why in case where original string was not split (it didn't contain part which matched regex from split argument) Java creators decided to return this string as it is.

This information is mentioned in documentation of split(String regex, int limit) method where you can read

If the expression does not match any part of the input then the resulting array has just one element, namely this string.

You can also see this behaviour in source code of this method (from Java 8)

2316      public String[] split(String regex, int limit) {
2317 /* fastpath if the regex is a
2318 (1)one-char String and this character is not one of the
2319 RegEx's meta characters ".$|()[{^?*+\\", or
2320 (2)two-char String and the first char is the backslash and
2321 the second is not the ascii digit or ascii letter.
2322 */
2323 char ch = 0;
2324 if (((regex.value.length == 1 &&
2325 ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
2326 (regex.length() == 2 &&
2327 regex.charAt(0) == '\\' &&
2328 (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
2329 ((ch-'a')|('z'-ch)) < 0 &&
2330 ((ch-'A')|('Z'-ch)) < 0)) &&
2331 (ch < Character.MIN_HIGH_SURROGATE ||
2332 ch > Character.MAX_LOW_SURROGATE))
2333 {
2334 int off = 0;
2335 int next = 0;
2336 boolean limited = limit > 0;
2337 ArrayList<String> list = new ArrayList<>();
2338 while ((next = indexOf(ch, off)) != -1) {
2339 if (!limited || list.size() < limit - 1) {
2340 list.add(substring(off, next));
2341 off = next + 1;
2342 } else { // last one
2343 //assert (list.size() == limit - 1);
2344 list.add(substring(off, value.length));
2345 off = value.length;
2346 break;
2347 }
2348 }
2349 // If no match was found, return this
2350 if (off == 0)
2351 return new String[]{this};
2353 // Add remaining segment
2354 if (!limited || list.size() < limit)
2355 list.add(substring(off, value.length));
2357 // Construct result
2358 int resultSize = list.size();
2359 if (limit == 0) {
2360 while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
2361 resultSize--;
2362 }
2363 }
2364 String[] result = new String[resultSize];
2365 return list.subList(0, resultSize).toArray(result);
2366 }
2367 return Pattern.compile(regex).split(this, limit);
2368 }

where you can find

if (off == 0)
    return new String[]{this};

fragment which means

  • if (off == 0) - if off (position from we should start next possible match for regex passed to split) is still 0 after iterating entire string then we didn't find any match, so string was not split
  • return new String[]{this}; so lets just return array with current (this) string.

Since "," couldn't be found in "" even once, "".split(",") must return array with one element (empty string on which you invoked split). This means that length of this array is 1.

share|improve this answer

From the Java 1.7 Documentation

In the Case 1 blank.split(",") does not match any part of the input then the resulting array has just one element, namely this String

i.e. it will return entire array, then the length will be 1.

In the Case 2 comma.split(",") will return empty, so length is 0

share|improve this answer

From String class javadoc for the public String[] split(String regex) method:

Splits this string around matches of the given regular expression.

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

In the first case, the expression does not match any part of the input so we got an array with only one element - the input.

In the second case, the expression matches input and split should return two empty strings; but, according to javadoc, they are discarded (because they are trailing and empty).

share|improve this answer
    
+1 This is the (umm; counting on my fingers here) sixth answer that says what result gets returned – and the first one that explains why. –  Scott 2 hours ago
String blank = "";                    
String comma = ",";                   
System.out.println("Output1: "+blank.split(",").length);  // case 1
System.out.println("Output2: "+comma.split(",").length);  // case 2

case 1 - Here blank.split(",") will return "" since there is no , in blank you get the same, So length will be 1

case 2- Here comma.split(",") will return empty array, you have to scape , if you want to count comma with length 1 else length will be 0

Again comma.split(",") split() expecting a regex as argument it will return result array to matching with that regex.

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string.

Else

If the expression does not match any part of the input then the resulting array has just one element, namely this string.

share|improve this answer
1  
please explain case 1 broadly –  nobalG 9 hours ago
    
The question is why it is returned "" ? –  sᴜʀᴇsʜ ᴀᴛᴛᴀ 9 hours ago

The API for the split method states that "If the expression does not match any part of the input then the resulting array has just one element, namely this string."

So, as the String blank doesn't contain a ",", a String[] with one element (i.e. blank itself) is returned.

For the String comma, "nothing" is left of the original string thus an empty array is returned.

This seems to be the best solution if you want to process the returned result, e. g.

String[] splits = aString.split(",");
for(String split: splits) {
   // do something
}
share|improve this answer

We can take a look into the source code of java.util.regex.Pattern which is behind String.split. Way down the rabbit hole the method

public String[] split(CharSequence input, int limit)

is invoked.

Input ""

For input "" this method is called as

String[] parts = split("", 0);

The intersting part of this method is:

  int index = 0;
  boolean matchLimited = limit > 0;
  ArrayList<String> matchList = new ArrayList<>();
  Matcher m = matcher(input);

  while(m.find()) {
    // Tichodroma: this will not happen for our input
  }

  // If no match was found, return this
  if (index == 0)
    return new String[] {input.toString()};

And that is what happens: new String[] {input.toString()} is returned.

Input ","

For input ","the intersting part is

    // Construct result
    int resultSize = matchList.size();
    if (limit == 0)
        while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
            resultSize--;
    String[] result = new String[resultSize];
    return matchList.subList(0, resultSize).toArray(result);

Here resultSize == 0 and limit == 0 so new String[0] is returned.

share|improve this answer
    
I believe that your last sentence is an oversimplification, so much so that it cripples the value of your answer. The interesting (i.e., relevant) part is lines 1223-1225. Entering line 1223, resultSize is 2, because matchList is { "", "" }. But, because limit is 0 (the default when split is called with only one parameter), the loop at lines 1224-1225 gets invoked, and it iterates twice, discarding the two null strings and decrementing resultSize to 0. –  Scott 2 hours ago

From JDK 1.7

 public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
           (1)one-char String and this character is not one of the
              RegEx's meta characters ".$|()[{^?*+\\", or
           (2)two-char String and the first char is the backslash and
              the second is not the ascii digit or ascii letter.
        */
        char ch = 0;
        if (((regex.count == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, count));
                    off = count;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[] { this };

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, count));

            // Construct result
            int resultSize = list.size();
            if (limit == 0)
                while (resultSize > 0 && list.get(resultSize-1).length() == 0)
                    resultSize--;
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

So for this case, the regex will be handled by the first if.

For the first case blank.split(",")

// If no match was found, return this
if (off == 0)
   return new String[] { this };

So, this function will return an array which contains one element if there is no matched.

For the second case comma.split(",")

List<String> list = new ArrayList<>();
//...
int resultSize = list.size();
if (limit == 0)
    while (resultSize > 0 && list.get(resultSize-1).length() == 0)
           resultSize--;
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);

As you notice, the last while loop has removed all empty element in the end of the list, so the resultSize is 0.

share|improve this answer

The point is that "split" function splits the string into parts separated by specified symbol (in this case this symbol is ","), then length property return the number of such parts.

So you have 2 lines: blank ("") and comma (","), the program just count how many comma-separated parts the line contains - it's 1 (for string ",") and 0 (for string "").

share|improve this answer
    
The question is "i do not understand the output of the program:", i described the behaviour and did it correct. If you disagree with that describe it yourself and let's compare! –  Alek Depler 9 hours ago
1  
"The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string." is it not what i say? –  Alek Depler 9 hours ago
1  
You write 'it's 1 (for string ",") and 0 (for string "")'. That's not what happens. –  Tichodroma 9 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.