Background
This question was brought to my attention in The 2nd Monitor chat room because in the past I have claimed that using exception handling to handle parse exceptions is "a bad idea and slow". This is exactly what your code is doing, and it's a bad idea, and slow.... at least, that's what I thought, until I benchmarked your code.
Now, in the past, I wrote a CSV parser and it used a similar system to yours to handle the values in a field, and I discovered that I got a significant speed-up (like 100% faster) when I prevalidated the values to an large extent, before doing a parseInt
or parseDouble
on them. I found that it is much better to "identify" a value of a certain type to a high degree of confidence, and thus reduce the number of exceptions thrown.
In your code, if the values are 1/3 integers, 1/3 double, and 1/3 string, then on average you are creating 1 exception for each value (none for ints, 1 for doubles, and 2 for strings). Worst case, if all your values are strings, you'll create 2 exceptions per value.
What if you could (almost) guarantee that all your parseInt
and parseDouble
calls will succeed, and you'll have (almost) no exceptions? Is the work to check the value "worth it"?
My claim is yes, it's worth it.
So, I have tried to prove it, and ... the results are interesting.
I used my MicroBench performance system to run the benchmark, and I built a dummy "load" for the doFoo
function. Let's look at my test-rig:
public class ParseVal {
private final LongAdder intsums = new LongAdder();
private final DoubleAdder doubsums = new DoubleAdder();
private final LongAdder stringsums = new LongAdder();
private final void doFoo(int val) {
intsums.add(val);
}
private final void doFoo(double val) {
doubsums.add(val);
}
private final void doFoo(String val) {
stringsums.add(val.length());
}
@Override
public String toString() {
return String.format("IntSum %d - DoubleSum %.9f - StringLen %d", intsums.longValue(), doubsums.doubleValue(), stringsums.longValue());
}
public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
ParseVal pv = new ParseVal();
for (String v : data) {
fn.accept(pv, v);
}
return pv.toString();
}
public static final String[] testData(int count) {
String[] data = new String[count];
Random rand = new Random(count);
for (int i = 0; i < count; i++) {
String base = String.valueOf(1000000000 - rand.nextInt(2000000000));
switch(i % 3) {
case 0:
break;
case 1:
base += "." + rand.nextInt(10000);
break;
case 2:
base += "foobar";
break;
}
data[i] = base;
}
return data;
}
.......
public void testStringOP(String val) {
String x = val.trim();
try {
int i = Integer.parseInt(x);
doFoo(i);
} catch (NumberFormatException e) {
try {
double d = Double.parseDouble(x);
doFoo(d);
} catch (NumberFormatException e2) {
doFoo(x);
}
}
}
public static void main(String[] args) {
String[] data = testData(1000);
String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
System.out.println(expect);
....
}
}
The doFoo
methods have an accumulator mechanism (adding up ints, doubles, and the string lengths) and making the results available in a toString
method.
Also, I have put your function in there as testStringOP
.
There is a testData
function which builds an array if input strings where there are approximately equal numbers of int, double, and string values.
Finally, the benchmark function:
public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
ParseVal pv = new ParseVal();
for (String v : data) {
fn.accept(pv, v);
}
return pv.toString();
}
That function takes an input function and the test data as an argument, and returns the String summary as a result. You would use this function like it's used in the main method....
String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
which runs the testStringOP
function on all the input data
values, and returns the accumulated string results.
What's nice is that I can now create other functions to test performance, for example testStringMyFn
and call:
String myresult = testFunction((pv, v) -> pv.testStringMyFn(v), data);
This is the basic tool I can use for the MicroBench system: https://github.com/rolfl/MicroBench
Scanner option
Let's start by comparing your function to the Scanner
type system recommended in another answer... Here's the code I used for the Scanner
:
public void testStringScanner(String val) {
val = val.trim();
try (Scanner scanner = new Scanner(val)) {
if (scanner.hasNextInt()) {
doFoo(scanner.nextInt());
} else if (scanner.hasNextDouble()) {
doFoo(scanner.nextDouble());
} else {
doFoo(val);
}
}
}
and here's how I benchmarked that code:
public static void main(String[] args) {
String[] data = testData(1000);
String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
System.out.println(expect);
UBench bench = new UBench("IntDoubleString Parser")
.addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
.addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
bench.press(10).report("Warmup");
bench.press(100).report("Final");
}
That runs the benchmark on both your function, and the Scanner function, and does a warmup run (to get JIT optimzations done), and a "Final" run to get real results.... what are the results, you ask?
Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
Count : 100 Average : 1.6914
Fastest : 1.5331 Slowest : 3.2561
95Pctile : 2.0277 99Pctile : 3.2561
TimeBlock : 1.794 2.037 1.674 1.654 1.674 1.588 1.665 1.588 1.634 1.606
Histogram : 99 1
Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
Count : 100 Average : 69.9713
Fastest : 67.2338 Slowest : 98.4322
95Pctile : 73.8073 99Pctile : 98.4322
TimeBlock : 77.028 70.050 69.325 69.860 69.094 68.498 68.547 68.779 69.586 68.945
Histogram : 100
What does that mean? It means, on average, your code is 40-times faster than the Scanner. Your code runs in 1.7Milliseconds to process 1000 input values, and the scanner runs in 70 milliseconds.
So, a Scanner
is a bad idea if performance is required, right? I agree.
Alternative
But, what about a RegEx pre-validation check? Note that the regex will not guarantee a clean parse, but it can go a long way. For example, the regex [+-]?\d+
will match any integer, right, but is -999999999999999999999
a valid integer? No, it's too big. But, it is a valid double. We will still need to have a try/catch block even if we pass the regex prevalidation. That's going to eliminate almost all exceptions, though....
So, what do we do to prevalidate things? Well, the Double.valueOf(String)
function documents a regex for matching double values in Strings. It's complicated, and I made a few modifications because we don't have already trimmed our inputs, but here's a couple of patterns for prevalidating double values, and integer values:
private static final String Digits = "(\\p{Digit}+)";
private static final String HexDigits = "(\\p{XDigit}+)";
private static final String Exp = "[eE][+-]?"+Digits;
private static final String fpRegex =
( //"[\\x00-\\x20]*"+ // Optional leading "whitespace"
"[+-]?(" + // Optional sign character
"NaN|" + // "NaN" string
"Infinity|" + // "Infinity" string
"((("+Digits+"(\\.)?("+Digits+"?)("+Exp+")?)|"+
"(\\.("+Digits+")("+Exp+")?)|"+
"((" +
"(0[xX]" + HexDigits + "(\\.)?)|" +
"(0[xX]" + HexDigits + "?(\\.)" + HexDigits + ")" +
")[pP][+-]?" + Digits + "))" +
"[fFdD]?))"); // +
//"[\\x00-\\x20]*");// Optional trailing "whitespace"
Pattern isDouble = Pattern.compile(fpRegex);
Pattern isInteger = Pattern.compile("[+-]?[0-9]+");
We can use those functions to build the code:
public void testStringRegex(String val) {
String x = val.trim();
if (isInteger.matcher(x).matches()) {
try {
doFoo(Integer.parseInt(x));
} catch (NumberFormatException nfe) {
try {
doFoo(Double.parseDouble(x));
} catch (NumberFormatException e) {
doFoo(x);
}
}
} else if (isDouble.matcher(x).matches()) {
try {
doFoo(Double.parseDouble(x));
} catch (NumberFormatException e) {
doFoo(x);
}
} else {
doFoo(x);
}
}
Now, that's pretty complicated, right? Well, it does a "quick" integer regex check, and if it's likely an integer, it tries to parse it as an integer, and fails over to a double, and then to a string....
If it's not likely an integer, it checks if it's a double, and so on.....
How can this code be faster, you ask? Well, we're almost certainly having clean parses when we do them, and we'll have almost no exceptions... But, is it actually faster?
Here are the results:
Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
Count : 100 Average : 1.6689
Fastest : 1.5580 Slowest : 2.1572
95Pctile : 1.8012 99Pctile : 2.1572
TimeBlock : 1.695 1.752 1.709 1.670 1.641 1.648 1.643 1.639 1.662 1.630
Histogram : 100
Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
Count : 100 Average : 1.9580
Fastest : 1.8379 Slowest : 2.5713
95Pctile : 2.1004 99Pctile : 2.5713
TimeBlock : 1.978 2.022 1.949 1.966 2.020 1.933 1.890 1.940 1.955 1.928
Histogram : 100
Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
Count : 100 Average : 69.8886
Fastest : 67.1848 Slowest : 77.2769
95Pctile : 71.9153 99Pctile : 77.2769
TimeBlock : 70.940 69.735 69.879 69.381 69.579 69.180 69.611 70.412 70.123 70.045
Histogram : 100
If you look, you'll see the regex version is Slower than the exception version... it runs in 1.95ms but the exception version runs in 1.67ms
Exceptions
But, there's a catch. In these tests, the stack trace for the exceptions is really small... and the "cost" of an exception depends on the depth of the trace, so let's increase the stack depths for the regex and exception code. Well add a recursive function to simulate a deeper stack:
public void testStringDeepOP(String val, int depth) {
if (depth <= 0) {
testStringOP(val);
} else {
testStringDeepOP(val, depth - 1);
}
}
public void testStringDeepRegex(String val, int depth) {
if (depth <= 0) {
testStringRegex(val);
} else {
testStringDeepRegex(val, depth - 1);
}
}
and we will test the OP and Regex code a different "depths" of nesting, 5, 10, and 20 layers deep. The benchmark code is:
UBench bench = new UBench("IntDoubleString Parser")
.addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
.addTask("OP D5", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 5), data), s -> expect.equals(s))
.addTask("OP D10", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 10), data), s -> expect.equals(s))
.addTask("OP D20", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 20), data), s -> expect.equals(s))
.addTask("Regex", () -> testFunction((pv, v) -> pv.testStringRegex(v), data), s -> expect.equals(s))
.addTask("Regex D5", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 5), data), s -> expect.equals(s))
.addTask("Regex D10", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 10), data), s -> expect.equals(s))
.addTask("Regex D20", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 20), data), s -> expect.equals(s))
.addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
bench.press(10).report("Warmup");
bench.press(100).report("Final");
What are the results?
Final
=====
Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
Count : 100 Average : 1.7005
Fastest : 1.5260 Slowest : 3.9813
95Pctile : 1.9346 99Pctile : 3.9813
TimeBlock : 1.682 1.624 1.612 1.675 1.708 1.658 1.727 1.738 1.672 1.910
Histogram : 99 1
Task IntDoubleString Parser -> OP D5: (Unit: MILLISECONDS)
Count : 100 Average : 1.9288
Fastest : 1.7325 Slowest : 4.9673
95Pctile : 2.0897 99Pctile : 4.9673
TimeBlock : 2.124 1.812 1.828 1.873 1.925 1.877 1.855 1.869 1.903 2.221
Histogram : 98 2
Task IntDoubleString Parser -> OP D10: (Unit: MILLISECONDS)
Count : 100 Average : 2.2271
Fastest : 2.0171 Slowest : 4.7395
95Pctile : 2.4904 99Pctile : 4.7395
TimeBlock : 2.392 2.125 2.129 2.152 2.246 2.169 2.189 2.203 2.247 2.420
Histogram : 98 2
Task IntDoubleString Parser -> OP D20: (Unit: MILLISECONDS)
Count : 100 Average : 2.9278
Fastest : 2.6838 Slowest : 6.3169
95Pctile : 3.2415 99Pctile : 6.3169
TimeBlock : 2.870 2.822 2.860 2.794 2.956 2.861 3.041 3.012 2.853 3.211
Histogram : 99 1
Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
Count : 100 Average : 2.0739
Fastest : 1.9338 Slowest : 3.8368
95Pctile : 2.2744 99Pctile : 3.8368
TimeBlock : 2.229 2.083 2.034 2.013 2.021 2.004 2.013 2.096 2.059 2.186
Histogram : 100
Task IntDoubleString Parser -> Regex D5: (Unit: MILLISECONDS)
Count : 100 Average : 2.0565
Fastest : 1.9377 Slowest : 3.2857
95Pctile : 2.2646 99Pctile : 3.2857
TimeBlock : 2.148 2.075 2.035 2.038 2.035 2.031 2.026 2.000 2.032 2.145
Histogram : 100
Task IntDoubleString Parser -> Regex D10: (Unit: MILLISECONDS)
Count : 100 Average : 2.0647
Fastest : 1.9598 Slowest : 2.6360
95Pctile : 2.2906 99Pctile : 2.6360
TimeBlock : 2.073 2.094 2.051 2.048 2.072 2.029 2.057 2.124 2.057 2.042
Histogram : 100
Task IntDoubleString Parser -> Regex D20: (Unit: MILLISECONDS)
Count : 100 Average : 2.0891
Fastest : 1.9930 Slowest : 2.6483
95Pctile : 2.2587 99Pctile : 2.6483
TimeBlock : 2.108 2.070 2.078 2.066 2.071 2.091 2.048 2.090 2.137 2.132
Histogram : 100
Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
Count : 100 Average : 71.7199
Fastest : 67.9621 Slowest : 152.0714
95Pctile : 75.2141 99Pctile : 152.0714
TimeBlock : 71.006 69.896 70.160 69.734 70.824 69.854 71.473 71.888 73.607 78.756
Histogram : 99 1
Here it is expressed as a table (using the average times):
0 5 10 20
OP 1.7005 1.9288 2.2271 2.9278
RegEx 2.0739 2.0565 2.0647 2.0891
Conclusion
So, that's the real problem with exceptions, the performance is unpredictable... and, for example, if you run it inside a Tomcat container, with stacks hundreds of levels deep, you may find this completely destroys your performance.
doFoo
is really doing? – Simon Forsberg♦ 21 hours agodoFoo
to call? This is unlike a language like Python. – Dair 21 hours ago