I'm trying to parse a CSV file into objects. I've already got a very efficient CSV parser (SuperCSV). The problem arises when trying to create the object with the data that is being read out.
I've got a nested foreach
that iterates firstly through a map (that represents a CSV row) and then through each key within the map (which represents a column for that row) to create an object.
for(Map<String, Object> csvRow : csvRows){
Contact contact = new Contact();
contact.setContactId(nextContactId);
for(String key : csvRow.keySet())
{
contact.setDetails(key, csvRow.get(key));
contactList.addContact(contact);
}
nextContactId++;
}
My problem arises in that the above code takes roughly 10 seconds to run. This is with a CSV input of about 2MB, but this service has to handle a file of 200MB (at least) efficiently. The reason I'm stuck is that the code below runs in roughly 1.6 seconds but only creates a single instance of a Contact object (which is fair enough) and I do see why this is occurring.
Contact contact = new Contact();
for(Map<String, Object> csvRow : csvRows){
contact.setContactId(nextContactId);
for(String key : csvRow.keySet())
{
contact.setDetails(key, csvRow.get(key));
contactList.addContact(contact);
}
nextContactId++;
}
Obviously if I do the latter, I end up with a single instance of a Contact object at the end instead of the many thousands that I should have. However I have had code similar to the latter create and save the objects in about 1.6 seconds (using a while
loop rather than the first foreach
) so I know it's possible.
while( (readCsvRow() != null) {
Contact contact = new Contact();
contact.setContactId(nextContactId);
for(String header : headers){
if(csvRow.get(header) != null)
{
contact.setDetails(header, csvRow.get(header));
contactList.addContact(contact);
}
}
nextContactId++;
}
Edit 1:
@Entity
@Table(name="CONTACT")
public class Contact {
@Id
@GeneratedValue(generator = "contactSequence")
@SequenceGenerator(name = "contactSequence", sequenceName = "CONTACT_SEQ")
@Column(name="rowId")
private Long rowId;
@Column(length=255)
@NotNull
private Long contactId;
@Column(length=255)
@NotNull
private String detailKey;
@Column(length=255)
@NotNull
private String detailValue;
@Generated(value=GenerationTime.INSERT)
@Type(type="org.jadira.usertype.dateandtime.joda.PersistentLocalDateTime")
private LocalDateTime dateCreated;
@Generated(value=GenerationTime.ALWAYS)
@Type(type="org.jadira.usertype.dateandtime.joda.PersistentLocalDateTime")
private LocalDateTime dateModified;
@ManyToOne
@JoinColumn(name="contactListJoinId")
private ContactList contactList;
public void setDetails(String detailKey, Object detailValue){
this.detailKey = detailKey;
this.detailValue = (String) detailValue;
}
public Long getRowId() {
return rowId;
}
public void setRowId(Long rowId) {
this.rowId = rowId;
}
public Long getContactId() {
return contactId;
}
public void setContactId(Long contactId) {
this.contactId = contactId;
}
public String getDetailKey() {
return detailKey;
}
public void setDetailKey(String detailKey) {
this.detailKey = detailKey;
}
public String getDetailValue() {
return detailValue;
}
public void setDetailValue(String detailValue) {
this.detailValue = detailValue;
}
public LocalDateTime getDateCreated() {
return dateCreated;
}
public void setDateCreated(LocalDateTime dateCreated) {
this.dateCreated = dateCreated;
}
public LocalDateTime getDateModified() {
return dateModified;
}
public void setDateModified(LocalDateTime dateModified) {
this.dateModified = dateModified;
}
public ContactList getContactList() {
return contactList;
}
public void setContactList(ContactList contactList) {
this.contactList = contactList;
}
@Override
public String toString() {
return ToStringBuilder.reflectionToString(this);
}
}
SOLUTION: The answer marked as correct gave me an idea on how to fix my issue. It was entirely correct that creating a Map
per row and passing the entire thing back to my class was causing a massive memory overhead. Basically I rewrote my CSVHelper class to allow it to only parse one row at a time and pass back the Map
representing that row to the method. This fixed my problem instantly and I'm now reading the 2MB file in 1.6 seconds again.
Contact
a FlyWeight? – Mat's Mug 2 days ago