Design Strategy of CSV Parser

Question

I wanted to review my design strategy for CSV parser. I have 4 CSV files, which have different layouts as shown below. Each row of a CSV file will be mapped to a class. For example, if CSV_FILE_A has 10 rows, a list contains 10 objects of ObjA will be returned after calling parser function. For CSV_FILE_B of 5 entries, a list of 5 ObjB will be returned.

CSV_FILE_A: LONG, STR, STR
CSV_FILE_B: LONG, LONG
CSV_FILE_C: LONG, LONG, STR, LONG
CSV_FILE_D: LONG, LONG, LONG

I designed an interface like this.

public interface CSVParser<T> { 
    public List<T> parseCSVFile(String fileName);
}

And I have created 4 different parsers, which implement the interface and have a different logic of parsing the CSV file. The code below is a complete implementation for CSV_FILE_B. The other three parsers will have the same structures but different if-else statements for setting object fields and different object types in the type of List.

public class csvBParser implements CSVParser {

    @Override   
    public List<ObjectB> parseCSVFile(String fileName) {

        List<ObjectB> list = new ArrayList<ObjectB>();

        try {
            ClassLoader classLoader = Thread.currentThread()
                    .getContextClassLoader();
            InputStream is = classLoader.getResourceAsStream(fileName);
            BufferedReader br = new BufferedReader(new InputStreamReader(is));

            String line = "";
            StringTokenizer token = null;
            int lineNum = 0, tokenNum = 0;

            while ((line = br.readLine()) != null) {        
                if (lineNum == 0){
                    lineNum++;
                    continue;
                }                   

                token = new StringTokenizer(line, ",");
                ObjectB objB = new ObjectB();

                while (token.hasMoreTokens()) {
                    String nextToken = token.nextToken();

                    boolean isTokenNull = false;
                    if (nextToken.equalsIgnoreCase("null"))
                        isTokenNull = true;

                    if (!isTokenNull) {
                        if (tokenNum == 0) {
                            objB.setObjBId(Long.valueOf(nextToken));
                        } else if (tokenNum == 1) {
                            objB.setObjBSndId(Long.valueOf(nextToken));
                        } else {
                            System.err.println("Invalid Token Number "
                                    + tokenNum);
                        }
                    }
                    tokenNum++;
                }

                list.add(objB);
                tokenNum = 0;
                lineNum++;
            }
        } catch (Exception e) {
            System.err.println("Parse Error " + e.getMessage());
        } 

        return list;
    }    
}

And, I finally uses a factory to create an instance of a parser.

public class CSVParserFactory {

    public static CSVParser getParser(TableType type) {

        CSVParser parser = null;
        if(type == A){
            parser = new csvAParser();
        }
        else if(type == B){
            parser = new csvBParser();
        }
        else if(type == C){
            parser = new csvCParser();
        }
        else if(type == D){
            parser = new csvDParser();
        }
        /*
         * more parsers will be added later
         */
        else{
            throw new IllegalArgumentException ("No such table type");
        }
        return parser;
    }
}

The parser is being called like this. Each element in the list is corresponding to a row in the CSV file

CSVParser parser = CSVParserFactory.getParser(TableType.A);
List list = parser.parseCSVFile(CSV_FILE_A);

Do you guys have any suggestions of this design pattern? If I consider a code factorization, what do I need to do?

Thanks in advance.

do you really need a different type to be sent back for each file type? I think a better design would like record set : one class representing a row of data; with properties : getType(), getNumberValues() ... and methods : getValueAsString(int index), getValueAsLong(int index), ... — tgkprog, Apr 23 '13 at 16:10

Joe F · Answer 1 · 2013-04-23 10:53:48Z

First, there would be much duplication between the parser classes. This kind of duplication should be avoided. If you find a bug in the way the csv is being parsed, you now have to fix it in multiple places. Consider using the template method pattern. In this scenario, you would extract the common logic into an abstract superclass with 1 or more abstract "template methods" that would be implemented by the subclasses. For instance, the abstract superclass could parse the fields into an array or strings, then pass the array to a method to construct an object using those fields.

public abstract class AbstractCSVParser<T> implements CSVParser<T> {

    public List<T> parseCSVFile(String fileName) {
        List<T> list = new ArrayList<T>();

        // for each line...
        String[] fields = // parse line into an array of strings
        T obj = buildObject(fields);
        list.add(obj);

        return list;
    }

    protected abstract T buildObject(String[] fields);

}

public class csvBParser extends AbstractCSVParser<ObjectB> {

    protected ObjectB buildObject(String[] fields) {
        ObjectB obj = new ObjectB();
        // populate object with fields
        return obj;
    }

}

In the parsing code, the InputStream and BufferedReader are never closed. Make sure you close resources when you are done with them, and make sure it is happening inside a finally block.

BufferedReader br = new BufferedReader(new InputStreamReader(is));
try {
    // do work with br
}
finally {
    br.close();
}

If you are using Java 7, you should check out the try-with-resources statement. The above example could be written like this:

try (BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
    // do work with br
} // close is called automatically at the end of the try block

In this example, the parser classes appear to be stateless. Given this, CSVParserFactory doesn't really need to return a new instance every time. One option to simplify CSVParserFactory is to initialize a static Map containing the parser instances.

public class CSVParserFactory {

    private static final Map<TableType, CSVParser> PARSERS = new HashMap<TableType, CSVParser>() {{
        put(A, new csvAParser());
        put(B, new csvBParser());
        put(C, new csvCParser());
        // etc
    }};

    public static CSVParser getParser(TableType type) {
        CSVParser parser = PARSERS.get(type);
        if (parser == null) {
            throw new IllegalArgumentException("No such table type");
        }
        return parser;
    }

}

If you don't want this, and still want to create the parser instances each time, you could use the same strategy with the class names instead of instances, then create the instance using reflection.

Finally, you may want to consider passing an InputStream to parseCSVFile instead of a file name. This would allow more flexibility of the source of the data. The current implementation is very specific as to the source.

h.j.k. · Answer 2 · 2013-04-23 04:07:36Z

up vote 1 down vote

For starters, consider having an abstract class to do the line-by-line reading, and having your concrete classes implement the code on converting one line into the desired object?

Also, the multiple if-else-if can be slightly improved by replacing it with a switch.

answered Apr 23 '13 at 4:07

h.j.k.

16.2k32084

1

Code examples would be a +1 – Jeff Vanzella Apr 23 '13 at 5:02

add a comment |

tgkprog · Answer 3 · 2013-04-23 15:12:56Z

public interface CSVParser looks good. Plus have a base class for parers. then you need just one concrete class for now that takes an List of with each element of a type csv type. the array has the number of columns in csv and each type is the data type of corresponding column - string or long

 //if parser is thread safe declare it as class static else declare here
 public static CSVParser getParser(TableType type) {
    if(type == A){        
    //So you need for type A:
    if(csvParserA == null){//cache it
        List<CSVType> lst = new ArrayList<CSVTYpe>();
        lst.add(CSVType.LONG);
        lst.add(CSVType.STR);
        lst.add(CSVType.LONG);
        csvParserA = new CsvParser(lst)
    }
    return csvParserA;    
}
    else if(type == B){
        //like make list init new parser and cache it
    }

The impl can be improved to make use of information given in constructor. So you have one final code for these four types, make it perfect and works for all.

Only cache if its thread safe else make a new one every time. I think its thread safe as no class instance data changes once its set in constructor.

Did not show the class deceleration - but would be class static instance, if its thread safe (and only then check if its null and init it first time only). If its not thread safe then you would declare it and initialize it in the if(type == A) block (do away with the is null check). — tgkprog, Apr 23 '13 at 15:11

asked	3 years ago
viewed	1825 times
active	3 years ago

current community

your communities

more stack exchange communities

Design Strategy of CSV Parser

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged java design-patterns csv or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Design Strategy of CSV Parser

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java design-patterns csv or ask your own question.

Related

Hot Network Questions