java.lang.NumberFormatException for input string “1”

Question

So, I have an issue that really bothers me. I have a simple parser that I made in java. Here is the piece of relevant code:

while( (line = br.readLine())!=null)
{
    String splitted[] = line.split(SPLITTER);
    int docNum = Integer.parseInt(splitted[0].trim());
    //do something
}

Input file is CSV file, the first entry of the file being an integer. When I start parsing, I immidiately get this exception:

Exception in thread "main" java.lang.NumberFormatException: For input string: "1"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at dipl.parser.TableParser.parse(TableParser.java:50)
at dipl.parser.DocumentParser.main(DocumentParser.java:87)

I checked the file, it indeed has 1 as its first value (no other characters are in that field), but I still get the message. I think that it may be because of file encoding: it is UTF-8, with Unix endlines. And the program is run on Ubuntu 14.04. Any suggestions where to look for the problem are welcome.

Nice one using copy and paste to put the error in the question! — T.J. Crowder, 7 hours ago

T.J. Crowder · Accepted Answer · 2016-09-26 18:23:01Z

up vote 28 down vote accepted

You have a BOM in front of that number; if I copy what looks like "1" in your question and paste it into vim, I see that you have a FE FF (e.g., a BOM) in front of it. From that link:

The exact bytes comprising the BOM will be whatever the Unicode character U+FEFF is converted into by that transformation format.

So that's the issue, consume the file with the appropriate reader for the transformation (UTF-8, UTF-16 big-endian, UTF-16 little-endian, etc.) the file is encoded with. See also this question and its answers for more about reading Unicode files in Java.

edited 28 mins ago

answered 7 hours ago

T.J. Crowder

478k78759891

2

That's a UTF-16 BOM. UTF-8 doesn't need a BOM, but if you add one the byte sequence is EF BB BF. – Doval 51 mins ago

1

@Doval: Thank you, I was absolutely wrong to say it was a UTF-8 BOM, and you're quite right that on-the-wire, the BOM for UTF-8 is EF BB BF. But what we're looking at is the end result of reading the file and then seeing the output in the error message. The file might be in any transformation; all BOMs end up being FE FF once read. – T.J. Crowder 38 mins ago

But if it was read raw, then...oh, I don't know. :-) Could well have been UTF-16. :-) It'll all depend on how the file was read into the stream. – T.J. Crowder 21 mins ago

add a comment |

asked	today
viewed	446 times
active	today

current community

your communities

more stack exchange communities

java.lang.NumberFormatException for input string “1”

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged java parsing encoding or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

java.lang.NumberFormatException for input string “1”

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java parsing encoding or ask your own question.

Linked

Related

Hot Network Questions