Determine if an input is an integer, a double or a String

Question

I created a method to test whether the specified input is a double, int or String and then pass it to a 3rd party library which has the methods doFoo which accepts either a String, double or int. This is the method that I would like feedback on:

public static void testString(String val) { 
    System.out.print("Original '" + val + "'  ");
    String x = val.trim();
    try {
        int i = Integer.parseInt(x);
        System.out.println("It's an integer: " + i);
        doFoo(i);
    } catch (NumberFormatException e) {
        try {
            double d = Double.parseDouble(x);
            System.out.println("It's a double: " + d);
            doFoo(d);
        } catch (NumberFormatException e2) {
            System.out.println("It's a String: " + x);
            doFoo(x);
        }
    }
}

Is this good code? Could it be improved? I don't like the throwing and catching of Exceptions.

Some test code to prove it works:

testString("N/A");
testString("19.");
testString("19.0");
testString("19.4");
testString("  1 ");
testString(" 1");
testString("1 ");
testString("1");
testString("    ");

Results in:

Original 'N/A'  It's a String: N/A
Original '19.'  It's a double: 19.0
Original '19.0'  It's a double: 19.0
Original '19.4'  It's a double: 19.4
Original '  1 '  It's an integer: 1
Original ' 1'  It's an integer: 1
Original '1 '  It's an integer: 1
Original '1'  It's an integer: 1
Original '    '  It's a String:

It's Java, so it's OK if it's verbose. You probably don't have a better choice. — kyrill, 21 hours ago
What is the use-case for this? What is it that doFoo is really doing? — Simon Forsberg♦, 21 hours ago
You do realize that if you redefine foo to have a different signature it the compiler will determine which doFoo to call? This is unlike a language like Python. — Dair, 21 hours ago
doFoo is actually part of Apache-Poi. It is setting an Excel cell value and there are 3 methods which are for double, String and int. — Phil, 17 hours ago

VinPro · Answer 1 · 2017-03-31 21:01:16Z

up vote 4 down vote

Is this good code?

Yes, except for using System.out.println statements for logging.

Could it be improved? I don't like the throwing and catching of Exceptions.

There isn't much to improve other than logging and some design improvements @Justin suggested. Your code is better performance wise than using Scanner methods. The Scanner methods have some overhead trying to validate the input and ultimately calls respective parse methods. If you were to use above code in a high performance application than no further improvements needed.

answered 18 hours ago

VinPro

411

Gah! I knew I should not have left those System.outs in! I do this for my own debugging (I am very old school) and I felt that it helped in the example. Consider the System.outs irrelevant. I was concerned that the throwing and catching was expensive and that there would be some other way (Apache Commons?) that would provide me with the solution I required. – Phil 17 hours ago

Note that if your requirement is strict on not parsing Long numbers as Double you've to handle that case and not parse it as Double as you've done. – VinPro 14 hours ago

add a comment |

Justin · Answer 2 · 2017-03-31 23:45:22Z

I don't like the throwing and catching of Exceptions

This can be made much cleaner with the use of a Scanner. It might not be the most performant way, but it's fast and easy to use.

try (Scanner scanner = new Scanner(x)) {
    if (scanner.hasNextInt()) doFoo(scanner.nextInt());
    else if (scanner.hasNextDouble()) doFoo(scanner.nextDouble());
    else doFoo(x);
}

However, if this is going to be called hundreds of thousands of times, the try catch method might be faster, though you should encapsulate those into their own functions. You'd need to profile to be sure which is faster, but I believe it would be this because Scanner.hasNextFoo uses regular expressions:

public static boolean isInteger(String str) {
    try {
        Integer.parse(str);
        return true;
    } catch (NumberFormatException e) {
        return false;
    }
}

Also, your function is doing multiple things: printing/reporting, and parsing/forwarding to doFoo. This is not a good thing. I'd recommend removing those and handling them where it's more appropriate:

public static void testString(String val) { 
    String x = val.trim();

    try (Scanner scanner = new Scanner(x)) {
        if (scanner.hasNextInt()) doFoo(scanner.nextInt());
        else if (scanner.hasNextDouble()) doFoo(scanner.nextDouble());
        else doFoo(x);
    }
}

That was much shorter. Now if you wanted the same functionality, it would look like so:

public static void testTestString(String val) {
     System.out.print("Original '" + val + "'  ");
     testString(val);
}

// ...

public static void doFoo(int i) {
    System.out.println("It's an integer: " + i);
    // ...
}

If you want your code to be extremely extensible, there is another way. Notice how the new function I suggested still does multiple things:

It detects the type of the string
It parses the value from the string
It forwards the value on to another function

We can separate these into their own components.

This is only really worth it if you can foresee adding types to be a common feature, but especially if the "another function" you forward to should be selectable by the user (say, if you packaged these functions as member functions of an object):

// Class is the easiest type we can return
private static Class<?> determineType(String val) {
    try (Scanner scanner = new Scanner(val)) {
        if (scanner.hasNextInt()) return Integer.class;
        if (scanner.hasNextDouble()) return Double.class;
        return String.class;
    }
}

private static final Map<Class<?>, Function<String, ?>> parsers = new IdentityHashMap<>();
private static final Map<Class<?>, Consumer<Object>> functionSwitch = new IdentityHashMap<>();

static {
    parsers.put(Integer.class, Integer::parseInt);
    parsers.put(Double.class, Double::parseDouble);
    parsers.put(String.class, Function.identity());

    // Note that, due to limitations in the type system,
    // i is of type Object, so we need to cast it to the appropriate
    // class before forwarding on to the function.
    functionSwitch.put(Integer.class, i -> doFoo((Integer) i));
    functionSwitch.put(Double.class, d -> doFoo((Double) d));
    functionSwitch.put(String.class, str -> doFoo((String) str));
}

public static void testString(String val) {
    val = val.trim(); // This could even be part of the parser's responsibility
    Class<?> stringType = determineType(val);
    Function<String, ?> parser = parsers.get(stringType);
    functionSwitch.get(stringType).accept(parser.apply(val));
}

Very nice answer. Unfortunately I should have said that performance is key here as my method may be called thousands of times. I was worried about the try / catch being un-performant. However, thank you for the detailed answer - I've found it illuminating — Phil, 17 hours ago
@Phil Even for thousands of time, Scanner is quite fast. Assuming it is used for input (which is already slow anyway), Scanner is not what is going to be taking up all your time. But the try {} catch way is about 15x faster — Justin, 15 hours ago

rolfl · Answer 3 · 2017-04-01 14:27:36Z

Background

This question was brought to my attention in The 2nd Monitor chat room because in the past I have claimed that using exception handling to handle parse exceptions is "a bad idea and slow". This is exactly what your code is doing, and it's a bad idea, and slow.... at least, that's what I thought, until I benchmarked your code.

Now, in the past, I wrote a CSV parser and it used a similar system to yours to handle the values in a field, and I discovered that I got a significant speed-up (like 100% faster) when I prevalidated the values to an large extent, before doing a parseInt or parseDouble on them. I found that it is much better to "identify" a value of a certain type to a high degree of confidence, and thus reduce the number of exceptions thrown.

In your code, if the values are 1/3 integers, 1/3 double, and 1/3 string, then on average you are creating 1 exception for each value (none for ints, 1 for doubles, and 2 for strings). Worst case, if all your values are strings, you'll create 2 exceptions per value.

What if you could (almost) guarantee that all your parseInt and parseDouble calls will succeed, and you'll have (almost) no exceptions? Is the work to check the value "worth it"?

My claim is yes, it's worth it.

So, I have tried to prove it, and ... the results are interesting.

I used my MicroBench performance system to run the benchmark, and I built a dummy "load" for the doFoo function. Let's look at my test-rig:

public class ParseVal {

    private final LongAdder intsums = new LongAdder();
    private final DoubleAdder doubsums = new DoubleAdder();
    private final LongAdder stringsums = new LongAdder();

    private final void doFoo(int val) {
        intsums.add(val);
    }

    private final void doFoo(double val) {
        doubsums.add(val);
    }

    private final void doFoo(String val) {
        stringsums.add(val.length());
    }

    @Override
    public String toString() {
        return String.format("IntSum %d - DoubleSum %.9f - StringLen %d", intsums.longValue(), doubsums.doubleValue(), stringsums.longValue());
    }

    public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
        ParseVal pv = new ParseVal();
        for (String v : data) {
            fn.accept(pv, v);
        }
        return pv.toString();
    }

    public static final String[] testData(int count) {
        String[] data = new String[count];
        Random rand = new Random(count);
        for (int i = 0; i < count; i++) {
            String base = String.valueOf(1000000000 - rand.nextInt(2000000000));
            switch(i % 3) {
                case 0:
                    break;
                case 1:
                    base += "." + rand.nextInt(10000);
                    break;
                case 2:
                    base += "foobar";
                    break;
            }
            data[i] = base;
        }
        return data;
    }

    .......


    public void testStringOP(String val) { 
        String x = val.trim();
        try {
            int i = Integer.parseInt(x);
            doFoo(i);
        } catch (NumberFormatException e) {
            try {
                double d = Double.parseDouble(x);
                doFoo(d);
            } catch (NumberFormatException e2) {
                doFoo(x);
            }
        }
    }

    public static void main(String[] args) {
        String[] data = testData(1000);
        String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
        System.out.println(expect);

        ....
    }

}

The doFoo methods have an accumulator mechanism (adding up ints, doubles, and the string lengths) and making the results available in a toString method.

Also, I have put your function in there as testStringOP.

There is a testData function which builds an array if input strings where there are approximately equal numbers of int, double, and string values.

Finally, the benchmark function:

public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
    ParseVal pv = new ParseVal();
    for (String v : data) {
        fn.accept(pv, v);
    }
    return pv.toString();
}

That function takes an input function and the test data as an argument, and returns the String summary as a result. You would use this function like it's used in the main method....

String expect = testFunction((pv, v) -> pv.testStringOP(v), data);

which runs the testStringOP function on all the input data values, and returns the accumulated string results.

What's nice is that I can now create other functions to test performance, for example testStringMyFn and call:

String myresult = testFunction((pv, v) -> pv.testStringMyFn(v), data);

This is the basic tool I can use for the MicroBench system: https://github.com/rolfl/MicroBench

Scanner option

Let's start by comparing your function to the Scanner type system recommended in another answer... Here's the code I used for the Scanner:

public void testStringScanner(String val) {
    val = val.trim();
    try (Scanner scanner = new Scanner(val)) {
        if (scanner.hasNextInt()) {
            doFoo(scanner.nextInt());
        } else if (scanner.hasNextDouble()) {
            doFoo(scanner.nextDouble());
        } else {
            doFoo(val);
        }
    }
}

and here's how I benchmarked that code:

public static void main(String[] args) {
    String[] data = testData(1000);
    String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
    System.out.println(expect);

    UBench bench = new UBench("IntDoubleString Parser")
        .addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
        .addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
    bench.press(10).report("Warmup");
    bench.press(100).report("Final");
}

That runs the benchmark on both your function, and the Scanner function, and does a warmup run (to get JIT optimzations done), and a "Final" run to get real results.... what are the results, you ask?

Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
  Count    :      100      Average  :   1.6914
  Fastest  :   1.5331      Slowest  :   3.2561
  95Pctile :   2.0277      99Pctile :   3.2561
  TimeBlock : 1.794 2.037 1.674 1.654 1.674 1.588 1.665 1.588 1.634 1.606
  Histogram :    99     1

Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
  Count    :      100      Average  :  69.9713
  Fastest  :  67.2338      Slowest  :  98.4322
  95Pctile :  73.8073      99Pctile :  98.4322
  TimeBlock : 77.028 70.050 69.325 69.860 69.094 68.498 68.547 68.779 69.586 68.945
  Histogram :   100

What does that mean? It means, on average, your code is 40-times faster than the Scanner. Your code runs in 1.7Milliseconds to process 1000 input values, and the scanner runs in 70 milliseconds.

So, a Scanner is a bad idea if performance is required, right? I agree.

Alternative

But, what about a RegEx pre-validation check? Note that the regex will not guarantee a clean parse, but it can go a long way. For example, the regex [+-]?\d+ will match any integer, right, but is -999999999999999999999 a valid integer? No, it's too big. But, it is a valid double. We will still need to have a try/catch block even if we pass the regex prevalidation. That's going to eliminate almost all exceptions, though....

So, what do we do to prevalidate things? Well, the Double.valueOf(String) function documents a regex for matching double values in Strings. It's complicated, and I made a few modifications because we don't have already trimmed our inputs, but here's a couple of patterns for prevalidating double values, and integer values:

private static final String Digits     = "(\\p{Digit}+)";
private static final String HexDigits  = "(\\p{XDigit}+)";
private static final String Exp        = "[eE][+-]?"+Digits;
private static final String fpRegex    =
    ( //"[\\x00-\\x20]*"+  // Optional leading "whitespace"
     "[+-]?(" + // Optional sign character
     "NaN|" +           // "NaN" string
     "Infinity|" +      // "Infinity" string
     "((("+Digits+"(\\.)?("+Digits+"?)("+Exp+")?)|"+
     "(\\.("+Digits+")("+Exp+")?)|"+
     "((" +
      "(0[xX]" + HexDigits + "(\\.)?)|" +
      "(0[xX]" + HexDigits + "?(\\.)" + HexDigits + ")" +
      ")[pP][+-]?" + Digits + "))" +
     "[fFdD]?))"); // +
     //"[\\x00-\\x20]*");// Optional trailing "whitespace"

Pattern isDouble = Pattern.compile(fpRegex);
Pattern isInteger = Pattern.compile("[+-]?[0-9]+");

We can use those functions to build the code:

public void testStringRegex(String val) { 
    String x = val.trim();
    if (isInteger.matcher(x).matches()) {
        try {
            doFoo(Integer.parseInt(x));
        } catch (NumberFormatException nfe) {
            try {
                doFoo(Double.parseDouble(x));
            } catch (NumberFormatException e) {
                doFoo(x);
            }
        }
    } else if (isDouble.matcher(x).matches()) {
        try {
            doFoo(Double.parseDouble(x));
        } catch (NumberFormatException e) {
            doFoo(x);
        }
    } else {
        doFoo(x);
    }
}

Now, that's pretty complicated, right? Well, it does a "quick" integer regex check, and if it's likely an integer, it tries to parse it as an integer, and fails over to a double, and then to a string....

If it's not likely an integer, it checks if it's a double, and so on.....

How can this code be faster, you ask? Well, we're almost certainly having clean parses when we do them, and we'll have almost no exceptions... But, is it actually faster?

Here are the results:

Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
  Count    :      100      Average  :   1.6689
  Fastest  :   1.5580      Slowest  :   2.1572
  95Pctile :   1.8012      99Pctile :   2.1572
  TimeBlock : 1.695 1.752 1.709 1.670 1.641 1.648 1.643 1.639 1.662 1.630
  Histogram :   100

Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
  Count    :      100      Average  :   1.9580
  Fastest  :   1.8379      Slowest  :   2.5713
  95Pctile :   2.1004      99Pctile :   2.5713
  TimeBlock : 1.978 2.022 1.949 1.966 2.020 1.933 1.890 1.940 1.955 1.928
  Histogram :   100

Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
  Count    :      100      Average  :  69.8886
  Fastest  :  67.1848      Slowest  :  77.2769
  95Pctile :  71.9153      99Pctile :  77.2769
  TimeBlock : 70.940 69.735 69.879 69.381 69.579 69.180 69.611 70.412 70.123 70.045
  Histogram :   100

If you look, you'll see the regex version is Slower than the exception version... it runs in 1.95ms but the exception version runs in 1.67ms

Exceptions

But, there's a catch. In these tests, the stack trace for the exceptions is really small... and the "cost" of an exception depends on the depth of the trace, so let's increase the stack depths for the regex and exception code. Well add a recursive function to simulate a deeper stack:

public void testStringDeepOP(String val, int depth) {
    if (depth <= 0) {
        testStringOP(val);
    } else {
        testStringDeepOP(val, depth - 1);
    }
}


public void testStringDeepRegex(String val, int depth) {
    if (depth <= 0) {
        testStringRegex(val);
    } else {
        testStringDeepRegex(val, depth - 1);
    }
}

and we will test the OP and Regex code a different "depths" of nesting, 5, 10, and 20 layers deep. The benchmark code is:

    UBench bench = new UBench("IntDoubleString Parser")
        .addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
        .addTask("OP D5", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 5), data), s -> expect.equals(s))
        .addTask("OP D10", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 10), data), s -> expect.equals(s))
        .addTask("OP D20", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 20), data), s -> expect.equals(s))
        .addTask("Regex", () -> testFunction((pv, v) -> pv.testStringRegex(v), data), s -> expect.equals(s))
        .addTask("Regex D5", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 5), data), s -> expect.equals(s))
        .addTask("Regex D10", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 10), data), s -> expect.equals(s))
        .addTask("Regex D20", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 20), data), s -> expect.equals(s))
        .addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
    bench.press(10).report("Warmup");
    bench.press(100).report("Final");

What are the results?

Final
=====

Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
  Count    :      100      Average  :   1.7005
  Fastest  :   1.5260      Slowest  :   3.9813
  95Pctile :   1.9346      99Pctile :   3.9813
  TimeBlock : 1.682 1.624 1.612 1.675 1.708 1.658 1.727 1.738 1.672 1.910
  Histogram :    99     1

Task IntDoubleString Parser -> OP D5: (Unit: MILLISECONDS)
  Count    :      100      Average  :   1.9288
  Fastest  :   1.7325      Slowest  :   4.9673
  95Pctile :   2.0897      99Pctile :   4.9673
  TimeBlock : 2.124 1.812 1.828 1.873 1.925 1.877 1.855 1.869 1.903 2.221
  Histogram :    98     2

Task IntDoubleString Parser -> OP D10: (Unit: MILLISECONDS)
  Count    :      100      Average  :   2.2271
  Fastest  :   2.0171      Slowest  :   4.7395
  95Pctile :   2.4904      99Pctile :   4.7395
  TimeBlock : 2.392 2.125 2.129 2.152 2.246 2.169 2.189 2.203 2.247 2.420
  Histogram :    98     2

Task IntDoubleString Parser -> OP D20: (Unit: MILLISECONDS)
  Count    :      100      Average  :   2.9278
  Fastest  :   2.6838      Slowest  :   6.3169
  95Pctile :   3.2415      99Pctile :   6.3169
  TimeBlock : 2.870 2.822 2.860 2.794 2.956 2.861 3.041 3.012 2.853 3.211
  Histogram :    99     1

Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
  Count    :      100      Average  :   2.0739
  Fastest  :   1.9338      Slowest  :   3.8368
  95Pctile :   2.2744      99Pctile :   3.8368
  TimeBlock : 2.229 2.083 2.034 2.013 2.021 2.004 2.013 2.096 2.059 2.186
  Histogram :   100

Task IntDoubleString Parser -> Regex D5: (Unit: MILLISECONDS)
  Count    :      100      Average  :   2.0565
  Fastest  :   1.9377      Slowest  :   3.2857
  95Pctile :   2.2646      99Pctile :   3.2857
  TimeBlock : 2.148 2.075 2.035 2.038 2.035 2.031 2.026 2.000 2.032 2.145
  Histogram :   100

Task IntDoubleString Parser -> Regex D10: (Unit: MILLISECONDS)
  Count    :      100      Average  :   2.0647
  Fastest  :   1.9598      Slowest  :   2.6360
  95Pctile :   2.2906      99Pctile :   2.6360
  TimeBlock : 2.073 2.094 2.051 2.048 2.072 2.029 2.057 2.124 2.057 2.042
  Histogram :   100

Task IntDoubleString Parser -> Regex D20: (Unit: MILLISECONDS)
  Count    :      100      Average  :   2.0891
  Fastest  :   1.9930      Slowest  :   2.6483
  95Pctile :   2.2587      99Pctile :   2.6483
  TimeBlock : 2.108 2.070 2.078 2.066 2.071 2.091 2.048 2.090 2.137 2.132
  Histogram :   100

Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
  Count    :      100      Average  :  71.7199
  Fastest  :  67.9621      Slowest  : 152.0714
  95Pctile :  75.2141      99Pctile : 152.0714
  TimeBlock : 71.006 69.896 70.160 69.734 70.824 69.854 71.473 71.888 73.607 78.756
  Histogram :    99     1

Here it is expressed as a table (using the average times):

        0        5        10       20
OP      1.7005   1.9288   2.2271   2.9278
RegEx   2.0739   2.0565   2.0647   2.0891

Conclusion

So, that's the real problem with exceptions, the performance is unpredictable... and, for example, if you run it inside a Tomcat container, with stacks hundreds of levels deep, you may find this completely destroys your performance.

asked	today
viewed	151 times
active	today

current community

your communities

more stack exchange communities

Determine if an input is an integer, a double or a String

3 Answers 3

Background

Scanner option

Alternative

Exceptions

Conclusion

Your Answer

Not the answer you're looking for? Browse other questions tagged java or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Determine if an input is an integer, a double or a String

3 Answers 3

Did you find this question interesting? Try our newsletter

Background

Scanner option

Alternative

Exceptions

Conclusion

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java or ask your own question.

Related

Hot Network Questions