Converting input from a FileReader to JSON and outputting it again

Question

Below is my code which I use to read data from a remote URL (which is GZipped), convert it to a Map, process the map (remove various unwanted fields, etc), the write it back to a file in JSON format.

Unfortunately, it's ugly. I'm doing multiple things in the same method, but can't think of a good way to break them apart, as the input files can have hundreds of thousands of lines, so they will cause me to run out of memory quickly if I try to read in the whole thing, then process it, then output it.

Can anyone offer any assistance/suggestions?

private void importTdatFile(String fileURL) {

    String filename = getFilename(fileURL) + ".gz";

    try {
        URL url = new URL(fileURL);

        // set up input
        GZIPInputStream gzis;
        if (new File(filename).isFile()) {
            InputStream is = ClassLoader.getSystemResourceAsStream(fileURL);
            gzis = new GZIPInputStream(is);
            System.out.println("Using tdat header from classes directory");
        } else {
            gzis = new GZIPInputStream(url.openStream());
        }
        BufferedReader reader = new BufferedReader(new InputStreamReader(gzis));

        // set up output
        BufferedWriter writer = new BufferedWriter(new FileWriter(catalog.getName() + ".json"));

        // create a template so I only have to create a map once
        Map<String, String> template = new LinkedHashMap<String, String>(catalog.getFieldData().size());
        for (String fieldName : catalog.getFieldData().keySet()) {
            template.put(fieldName, null);
        }

        // start processing
        while (reader.ready()) {
            String line = reader.readLine();
            if (line.matches("^(.*?\\|)*$")) {
                Map<String, String> result = new HashMap<String, String>();
                String[] fieldNames = catalog.getFieldData().keySet().toArray(new String[]{});
                String[] fieldValues = line.split("\\|");

                for (int i = 0; i < fieldValues.length; i++) {
                    FieldData fd = catalog.getFieldData().get(fieldNames[i]);
                    if (catalog.getFieldDataSet().contains(fd)) {
                        result.put(fieldNames[i], fieldValues[i]);
                    }
                }

                result = removeNulls(result);
                result = removeUnwantedFields(result, catalog);
                result = fixFieldPrefixes(result, catalog);
                result = fixFieldNames(result, catalog);

                writer.write(getJsonLine(result));

            }
        }

        writer.close();
        reader.close();
        gzis.close();
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

palacsint · Accepted Answer · 2011-12-21 10:44:04Z

An idea:

private BufferedReader getReader(final String fileUrl) throws IOException {
    final String filename = getFilename(fileUrl) + ".gz";
    final URL url = new URL(fileUrl);
    final InputStream stream;
    if (new File(filename).isFile()) {
        stream = ClassLoader.getSystemResourceAsStream(fileUrl);
        System.out.println("Using tdat header from classes directory");
    } else {
        stream = url.openStream();
    }
    final GZIPInputStream gzipStream = new GZIPInputStream(stream);
    final InputStreamReader gzipStreamReader = 
        new InputStreamReader(gzipStream, "UTF-8");
    final BufferedReader reader = new BufferedReader(gzipStreamReader);
    return reader;
}

You don't have to close the GZIPInputStream, Reader.close() does it.

I'd invert the condition inside the while loop:

// start processing
while (reader.ready()) {
    final String line = reader.readLine();
    if (!line.matches("^(.*?\\|)*$")) {
        continue;
    }
    Map<String, String> result = new HashMap<String, String>();
    ...
}

It makes the code flatten which is easier to read.

This code maybe unnecessary, since no one uses the template˛object:

// create a template so I only have to create a map once
final Map<String, String> template = 
    new LinkedHashMap<String, String>(catalog.getFieldData().size());
for (final String fieldName : catalog.getFieldData().keySet()) {
    template.put(fieldName, null);
}

You should specify the charset when you call the constructor of the InputStreamReader.

final InputStreamReader gzipStreamReader = 
    new InputStreamReader(gzipStream, "UTF-8");

Omitting it could lead to 'interesting' surprises, since it will use the default charset which vary from system to system.

Here is the code after a few more method extractions. Check the comments please and feel free to ask if something isn't clear.

public void importTdatFile(final String fileUrl) throws MyAppException {
    try {
        doImport(fileUrl);
    } catch (final IOException e) {
        // MalformedURLException is a subclass of IOException
        throw new MyAppException("Cannot import", e);
    }
}

private void doImport(final String fileUrl) throws IOException {
    BufferedReader reader = null;
    BufferedWriter writer = null;
    try {
        reader = getReader(fileUrl);
        writer = getWriter();

        while (reader.ready()) {
            final String line = reader.readLine();
            final Map<String, String> results = processLine(line);
            filterResults(results);
            final String jsonLine = getJsonLine(results);
            writer.write(jsonLine);
        }
    } finally {
        closeQuetly(reader);
        closeQuetly(writer);
    }
}

private BufferedReader getReader(final String fileUrl) throws IOException {
    final InputStream stream = getStream(fileUrl);
    final BufferedReader reader = createGzipReader(stream);
    return reader;
}

private InputStream getStream(final String fileUrl) 
    throws MalformedURLException, IOException {
    final InputStream stream;
    // I don't really like this 'if' here
    if (isGzipFile(fileUrl)) {
        // I'm not sure that the condition is correct for classpath loading
        // or not
        stream = ClassLoader.getSystemResourceAsStream(fileUrl);
        // I would put this println to somewhere else (after refactoring the 'if')
        System.out.println("Using tdat header from classes directory");
    } else {
        final URL url = new URL(fileUrl);
        stream = url.openStream();
    }
    return stream;
}

private BufferedReader createGzipReader(final InputStream stream) 
        throws IOException, UnsupportedEncodingException {
    final GZIPInputStream gzipStream = new GZIPInputStream(stream);
    final InputStreamReader gzipStreamReader = 
        new InputStreamReader(gzipStream, "UTF-8");
    final BufferedReader reader = new BufferedReader(gzipStreamReader);
    return reader;
}

private boolean isGzipFile(final String fileUrl) {
    final String filename = getFilename(fileUrl) + ".gz";
    return new File(filename).isFile();
}

private BufferedWriter getWriter() throws IOException {
    // TODO: FileWriter use the default character encoding (see javadoc),
    // maybe you should use a FileOutputStream with a specified encoding.
    final String outputFilename = getOutputFilename();
    final FileWriter fileWriter = new FileWriter(outputFilename);
    return new BufferedWriter(fileWriter);
}

private String getOutputFilename() {
    return catalog.getName() + ".json";
}

private Map<String, String> processLine(final String line) {
    final Map<String, String> result = new HashMap<String, String>();
    if (!isProcessableLine(line)) {
        return result;
    }
    // It's hard to refactor without the internals of catalog.
    final String[] fieldValues = line.split("\\|");

    for (int i = 0; i < fieldValues.length; i++) {
        // TODO: possible ArrayIndexOutOfBoundsException here?
        final String fieldName = catalog.getFieldName(i);
        final FieldData fieldData = catalog.getFieldData(fieldName);
        if (catalog.fieldDataSetContains(fieldData)) {
            final String fieldValue = fieldValues[i];
            result.put(fieldName, fieldValue);
        }
    }

    return result;
}

private void filterResults(final Map<String, String> results) {
    removeNulls(results);
    // TODO: catalog probably a field, so it should be visible inside these
    // methods without passing them as a parameter
    removeUnwantedFields(results, catalog);
    fixFieldPrefixes(results, catalog);
    fixFieldNames(results, catalog);
}

private boolean isProcessableLine(final String line) {
    // TODO: A precompiled regexp maybe faster but it would be premature
    // optimization, so don't do that without profiling
    return line.matches("^(.*?\\|)*$");
}

private void closeQuetly(final Closeable closeable) {
    if (closeable == null) {
        return;
    }
    try {
        closeable.close();
    } catch (final IOException e) {
        // TODO: log the exception with a logger instead of the
        // printStackTrace();
        e.printStackTrace();
    }
}

Anyway, your streaming approach is fine, you shouldn't read the whole file into the memory.

I like your ideas and incorporated them, but I still don't like that I'm doing three separate things in one method: import from file, convert, and export to JSON. I've got an EventBus implemented for the application, but I'm reluctant to fire events from importer to converter to exporter, which seems to be overkill (and use alot of resources for large input files)
What do you think about the update? Now almost every method has single responsibility. For example, doImport controls the conversion while the called methods do the low-level work.
Nice. I didn't quite do everything you suggested, but did the majority. I had also not mentioned that importTdatFile() was one of a couple of import methods for file types. I ended up implementing a strategy pattern for processLine(). The current version of the file is at github.com/dartmanx/HEASARC-Utilities/blob/0.2.0/catalogparser/…. Thanks for the help.

asked	1 year ago
viewed	316 times
active	1 year ago

Converting input from a FileReader to JSON and outputting it again

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged java or ask your own question.

Converting input from a FileReader to JSON and outputting it again

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java or ask your own question.

Related