Everything happens according to plan, but lets do it step by step (hope you have some time).
According to documentation (and source code) of split(String regex)
method
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero.
So when you invoke
split(String regex)
you are actually getting result from split(String regex, int limit)
method which is invoked in a way
split(regex, 0)
So here limit
is set to 0
.
You need to know few things about this parameter:
- If
limit
is positive you are limiting length of result array to positive number you specified so "axaxaxaxa".split("x",2)
will return array ["a", "axaxaxa"]
, not ["a","a","a","a","a"]
If limit
is 0
then you are not limiting length of result array. But it also means that any trailing empty strings will be removed. For example:
"fooXbarX".split("X")
will at start generate array which will look like
["foo", "bar", ""]
("barX"
split on "X"
generates "bar"
and ""
), but since split
removes all trailing empty string it will return
["foo", "bar"]
Behaviour of negative value of limit
is similar to behaviour where limit is set to 0
(it will not limit length of result array). Only difference is that it will not remove empty strings from end of result array. In other words
"fooXbarX".split("X",-1)
will return ["foo", "bar", ""]
I will try to answer at first case
",".split(",").length
which as explained earlier is effectively same as
",".split(",", 0).length
This means that we are using version of split which will not limit length of result array, BUT will remove all trailing empty strings ""
. You need to understand that when we split ONE thing we are always getting TWO thing.
In other words if we split "abc"
in place of b
we will get "a"
and "c"
.
Tricky part is to understand that if we split "abc"
in c
we will get "ab" and
""` (empty String).
Using this logic if we split ","
on ,
we will get ""
and ""
(two empty strings).
You can check it easily using
for (String s: ",".split(",", -1))
System.out.println("\""+s+"\"");
which will produce
""
""
so our result array contains ["", ""]
But we set limit
to 0
, so we decided that all trailing empty strings will be removed. In this case result array contains only trailing empty strings so all of them will be removed leaving you with empty array []
which length is 0
.
To answer case
"".split(",").length
you need to understand that removing trailing empty strings has sense only if such trailing empty strings appeared as result of splitting and are unwanted. So if there ware not any places on which we could split there is no point in running this "cleaning" process. That is why in case where original string was not split (it didn't contain part which matched regex from split
argument) Java creators decided to return this string as it is.
This information is mentioned in documentation of split(String regex, int limit)
method where you can read
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
You can also see this behaviour in source code of this method (from Java 8)
2316 public String[] split(String regex, int limit) {
2317 /* fastpath if the regex is a
2318 (1)one-char String and this character is not one of the
2319 RegEx's meta characters ".$|()[{^?*+\\", or
2320 (2)two-char String and the first char is the backslash and
2321 the second is not the ascii digit or ascii letter.
2322 */
2323 char ch = 0;
2324 if (((regex.value.length == 1 &&
2325 ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
2326 (regex.length() == 2 &&
2327 regex.charAt(0) == '\\' &&
2328 (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
2329 ((ch-'a')|('z'-ch)) < 0 &&
2330 ((ch-'A')|('Z'-ch)) < 0)) &&
2331 (ch < Character.MIN_HIGH_SURROGATE ||
2332 ch > Character.MAX_LOW_SURROGATE))
2333 {
2334 int off = 0;
2335 int next = 0;
2336 boolean limited = limit > 0;
2337 ArrayList<String> list = new ArrayList<>();
2338 while ((next = indexOf(ch, off)) != -1) {
2339 if (!limited || list.size() < limit - 1) {
2340 list.add(substring(off, next));
2341 off = next + 1;
2342 } else { // last one
2343 //assert (list.size() == limit - 1);
2344 list.add(substring(off, value.length));
2345 off = value.length;
2346 break;
2347 }
2348 }
2349 // If no match was found, return this
2350 if (off == 0)
2351 return new String[]{this};
2353 // Add remaining segment
2354 if (!limited || list.size() < limit)
2355 list.add(substring(off, value.length));
2357 // Construct result
2358 int resultSize = list.size();
2359 if (limit == 0) {
2360 while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
2361 resultSize--;
2362 }
2363 }
2364 String[] result = new String[resultSize];
2365 return list.subList(0, resultSize).toArray(result);
2366 }
2367 return Pattern.compile(regex).split(this, limit);
2368 }
where you can find
if (off == 0)
return new String[]{this};
fragment which means
if (off == 0)
- if off
(position from we should start next possible match for regex passed to split
) is still 0
after iterating entire string then we didn't find any match, so string was not split
return new String[]{this};
so lets just return array with current (this
) string.
Since ","
couldn't be found in ""
even once, "".split(",")
must return array with one element (empty string on which you invoked split
). This means that length of this array is 1
.
",".split(",")
could return["",""]
array but it returns[]
(empty array - length 0 - becausesplit(",",0)
trails empty Strings at the end). So why empty string in result array was not trailed in case of"",split(",")
? – Pshemo 8 hours agoString.split()
is wrong (fields can contain embedded commas). – Andrew Medico 7 hours ago