I have to parse file line by line and in single line I have split by ",". First String would be Name and Second would be count. Finaly I have to display the Key and Count
For example
Peter,2
Smith,3
Peter,3
Smith,5
I should display as Peter 5 and Smith 8.
So I was in confusion to choose between BufferedReader
vs Scanner
. I went through link and came up with these two approaches. I would like to get your concerns.
Approach 1: use buffered Reader
private HashMap<String, MutableLong> readFile(File file) throws IOException {
final HashMap<String, MutableLong> keyHolder = new HashMap<>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-8"))) {
for (String line; (line = br.readLine()) != null;) {
// processing the line.
final String[] keyContents = line
.split(KeyCountExam.COMMA_DELIMETER);
if (keyContents.length == 2) {
final String keyName = keyContents[0];
final long count = Long.parseLong(keyContents[1]);
final MutableLong keyCount = keyHolder.get(keyName);
if (keyCount != null) {
keyCount.add(count);
keyHolder.put(keyName, keyCount);
} else {
keyHolder.put(keyName, new MutableLong(count));
}
}
}
}
return keyHolder;
}
private static final String COMMA_DELIMETER = ",";
private static volatile Pattern commaPattern = Pattern
.compile(COMMA_DELIMETER);
I have used MutableLong ( , since i dont want to create BigInteger for each time . And again it may be very big file and i don't have control on how max a key can occur
Another approach: use Scanner and use two Delimiter
private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]";
private static final String LINE_PATTERN = ".*(" + LINE_SEPARATOR_PATTERN
+ ")|.+$";
private static volatile Pattern linePattern = Pattern.compile(LINE_PATTERN);
I have went through the hasNext
in Scanner
and to me there is no harm in switching the Pattern
. And I believe from Java 7, Scanner
does have a limited buffer and can be enough for this kind of file.
Does anyone prefer Approach 2 over Approach 1, or do we have any other option than this? I just did sop for testing purpose. Obviously the same code in approach 1 would replace here. Using split in Approach 1 would create multiple String instances, which can be avoided here ( am I right), by scanning char sequence.
private HashMap<String, BigInteger> readFileScanner(File file)
throws IOException {
final HashMap<String, BigInteger> keyHolder = new HashMap<>();
try (Scanner br = new Scanner(file, "UTF-8")) {
while (br.hasNext()) {
br.useDelimiter(commaPattern);
System.out.println(br.next());
System.out.println(br.next());
br.useDelimiter(linePattern);
}
}
return keyHolder;
}
UPDATED:
Of course the default choice is opencsv, since I could not use open source APIs other than Java native API that I haven't chosen. MutableLong
is the class I created under my project by wrapping long primitive variable; it is not the one from Apache. I named it and gave a link so that it could easily be understood.