Object Creation during loops

Question

I'm trying to parse a CSV file into objects. I've already got a very efficient CSV parser (SuperCSV). The problem arises when trying to create the object with the data that is being read out.

I've got a nested foreach that iterates firstly through a map (that represents a CSV row) and then through each key within the map (which represents a column for that row) to create an object.

for(Map<String, Object> csvRow : csvRows){
    Contact contact  = new Contact();
    contact.setContactId(nextContactId);
    for(String key : csvRow.keySet())
    {
        contact.setDetails(key, csvRow.get(key));
        contactList.addContact(contact);
    }
    nextContactId++;
}

My problem arises in that the above code takes roughly 10 seconds to run. This is with a CSV input of about 2MB, but this service has to handle a file of 200MB (at least) efficiently. The reason I'm stuck is that the code below runs in roughly 1.6 seconds but only creates a single instance of a Contact object (which is fair enough) and I do see why this is occurring.

Contact contact  = new Contact();
for(Map<String, Object> csvRow : csvRows){

    contact.setContactId(nextContactId);

    for(String key : csvRow.keySet())
    {
        contact.setDetails(key, csvRow.get(key));
        contactList.addContact(contact);
    }
    nextContactId++;
}

Obviously if I do the latter, I end up with a single instance of a Contact object at the end instead of the many thousands that I should have. However I have had code similar to the latter create and save the objects in about 1.6 seconds (using a while loop rather than the first foreach) so I know it's possible.

while( (readCsvRow() != null) {     
    Contact contact = new Contact();
    contact.setContactId(nextContactId);
    for(String header : headers){
        if(csvRow.get(header) != null)
        {
            contact.setDetails(header, csvRow.get(header));
            contactList.addContact(contact);
        }
    }
nextContactId++;
}

Edit 1:

@Entity
@Table(name="CONTACT")
public class Contact {
@Id
@GeneratedValue(generator = "contactSequence")
@SequenceGenerator(name = "contactSequence", sequenceName = "CONTACT_SEQ")
@Column(name="rowId")
private Long rowId;

@Column(length=255)
@NotNull
private Long contactId;

@Column(length=255)
@NotNull
private String detailKey;

@Column(length=255)
@NotNull
private String detailValue;

@Generated(value=GenerationTime.INSERT)
@Type(type="org.jadira.usertype.dateandtime.joda.PersistentLocalDateTime")
private LocalDateTime dateCreated;

@Generated(value=GenerationTime.ALWAYS)
@Type(type="org.jadira.usertype.dateandtime.joda.PersistentLocalDateTime")
private LocalDateTime dateModified;

@ManyToOne
@JoinColumn(name="contactListJoinId")
private ContactList contactList;

public void setDetails(String detailKey, Object detailValue){
    this.detailKey = detailKey;
    this.detailValue = (String) detailValue;
}

public Long getRowId() {
    return rowId;
}

public void setRowId(Long rowId) {
    this.rowId = rowId;
}

public Long getContactId() {
    return contactId;
}

public void setContactId(Long contactId) {
    this.contactId = contactId;
}

public String getDetailKey() {
    return detailKey;
}

public void setDetailKey(String detailKey) {
    this.detailKey = detailKey;
}

public String getDetailValue() {
    return detailValue;
}

public void setDetailValue(String detailValue) {
    this.detailValue = detailValue;
}

public LocalDateTime getDateCreated() {
    return dateCreated;
}

public void setDateCreated(LocalDateTime dateCreated) {
    this.dateCreated = dateCreated;
}

public LocalDateTime getDateModified() {
    return dateModified;
}

public void setDateModified(LocalDateTime dateModified) {
    this.dateModified = dateModified;
}

public ContactList getContactList() {
    return contactList;
}

public void setContactList(ContactList contactList) {
    this.contactList = contactList;
}

@Override
public String toString() {
    return ToStringBuilder.reflectionToString(this);
}
}

SOLUTION: The answer marked as correct gave me an idea on how to fix my issue. It was entirely correct that creating a Map per row and passing the entire thing back to my class was causing a massive memory overhead. Basically I rewrote my CSVHelper class to allow it to only parse one row at a time and pass back the Map representing that row to the method. This fixed my problem instantly and I'm now reading the 2MB file in 1.6 seconds again.

Unfortunately the design is set by my Senior Developer, and because of the persistence provider I'm using this isn't possible. Each Contact object has a specific database instance, and there is a one to many relationship between a ContactList and Contact. — James Massey, 2 days ago
Bingo .... it is not the code that's a problem, it is the persistence layer — rolfl, 2 days ago
Ahh, I did consider that it might be because of my persistence provider, but why did it work efficiently before I extracted out the CSV parsing functionality (see third code block)? Basically, I moved the CSV parsing into a helper class that returns back an Array of Maps (the Array representing each row of the CSV and the map contains the column values) — James Massey, 2 days ago
I've rolled back your post in case any of the answers were addressing that (but someone could correct me on that). Also, as sincere as your compliments are, they don't belong in the question itself. You're welcome to thank each answerer under their answers as well as upvoting them. — Jamal♦, 2 days ago

rolfl · Accepted Answer · 2014-03-04 03:37:52Z

Map.Entry<....>

HashMaps (Maps in general) require time to do key lookups.

You can save a significant number of these by changing your inner loop:

for(String key : csvRow.keySet())
{
    contact.setDetails(key, csvRow.get(key));
    contactList.addContact(contact);
}

to be:

for(Map.Entry<String,Object> me : csvRow.entrySet())
{
    contact.setDetails(me.getKey(), me.getValue());
    contactList.addContact(contact);
}

This will save a bunch of time, but the actual amount will need to be measured.

Profile

There is little in your code that has any other performance impact other than the Contact.

You need to supply that Contact's code, or alternatively profile it yourself.

Memory

OK, I have seen this before.

Consider your code, it reads say 100,000 records from file.

First things first, are you using RegEx/SubString to break the columns out of the CSV?

If you are, you may be hanging on to a lot more memory than you may realize.

Secondly, you are creating 100,000 Maps.

Each Map is an object, and it creates a HashTable for the values. Each Map.Entry is an object, etc. So, with, say, 100,000 maps, and each one, with say 10 columns, is 512 bytes, it amounts to 50 MB.

Your for-loop is creating all of this infrastructure in memory, and that is a slow process.

The while-loop, I presume, has just a single row in memory at any one time, so the memory management is simpler, faster, smaller, and generally works.

I've just tried that. Any performance increase was negligible. I'm sure that the issue is around where I'm creating a new Contact object. Thanks for your suggestion though, I'll leave it in the codebase, every little bit helps. — James Massey, 2 days ago
Contact is just a POJO with a one-to-many relationship to a ContactList object. But I posted the code above — James Massey, 2 days ago
I'm using a 3rd party library called SuperCSV to parse the CSV file, and it is pretty efficient. I know that the bulk of the processing time is spent in the creation of the Contact object. What would you suggest, if I'm trying to reduce the memory footprint of my Map's — James Massey, 2 days ago

palacsint · Answer 2 · 2014-03-04 03:48:34Z

I guess you are running out of memory, try increase the heap size. For example, if it's 512MB... then, storing 200 MB data with the object overheads could use it all.
You could also check how much time spent your JVM with garbage collection. VisualVM is good for that, it's bundled with Oracle's JDK.
```
for(Map<String, Object> csvRow : csvRows){
    Contact contact  = new Contact();
    contact.setContactId(nextContactId);
    for(String key : csvRow.keySet())
    {
        contact.setDetails(key, csvRow.get(key));
        contactList.addContact(contact);
    }
    nextContactId++;
}
```
Are you sure that adding the same contact instance to the list as many time as many columns a row has is deliberate here? Shouldn't addContact be after the for loop?

If contactList is a java.util.List it will store one Contact object for each row but the Contant object's detailKey and detailValue will contain only data of the last column.

1. If I place the new and the setContactId inside the 2nd for loop I do indeed overflow the heap. But its just a time issue at the moment. 2. This is all being tested in unit tests, so I have currentTimeMillis() calls measuring how long things are taking. 3. Each Contact object represents a separate piece of contact information, and it all has to go into the ContactList. These disparate pieces of information can be made whole by a wrapper object I have. ContactList should store a Contact object for each Key in the map. — James Massey, 2 days ago

asked	2 days ago
viewed	351 times
active	yesterday

current community

your communities

more stack exchange communities

Object Creation during loops

2 Answers

Map.Entry<....>

Profile

Memory

Your Answer

Not the answer you're looking for? Browse other questions tagged java performance loop parsing memory-management or ask your own question.

Community Bulletin

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Object Creation during loops

2 Answers

Map.Entry<....>

Profile

Memory

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged java performance loop parsing memory-management or ask your own question.

Community Bulletin

Linked

Related

Hot Network Questions