Partial Update of Database Table

Question

I have a database table with numerous fields including "category". The source of this table is an xml file which the system receiving occasionally. Each xml file contains data for one category. The table needs to be updated with the new category data from the xml file. The xml file includes all data for that category, not just the changes.

I see two possible ways of handling this:

First deleting all rows from the table where category = categoryID, and then doing inserts based on all the xml data. Obviously the deletion and insertion operations would be contained within a transaction.
For each record in the xml, first do a select to test if that record exists. If it does exist, update the data with that from the xml. If it doesn't exists, insert it.

Obviously, the second approach avoids deleting the data first but involves many more db queries although this could be mitigated by doing an initial select to a hash table, and just querying the hash. The big downside to the second approach is how to handle deletions, i.e. records that no longer appear in the xml and should be removed from the table.

What is the best practice for handling this kind of operation?

Thanks.

Recommended reading: Why is asking a question on “best practice” a bad thing? This is purely a wording issue: instead of appealing to the authority of "best practice," simply state what your requirements are and where you are not sure how to proceed. — Snowman, Jul 10 '15 at 15:04
@DanPichelman At least in SQL Server, a MERGE statement can have unintended consequences (there are some lingering bugs with its implementation, and it also unconditionally updates all rows that it doesn't insert, which may not be intuitive at first sight). — mgw854, Jul 10 '15 at 18:33

Snowman · Answer 1 · 2015-07-10 15:09:49Z

Since you have the primary key of each record, I highly recommend using the approach of examining each record individually.

Primary key lookups are extremely fast in any professional-grade database.
Updating a record based on primary key has a very granular lock level and is fast.
By inserting or updating rather than delete/insert, you maintain the existence of data: at no time is it possible to query the category table and come up empty (unless you are querying a category that you have not gotten around to inserting yet).

In short, do not worry about performing extra queries here because the queries will execute just about as fast as possible. I also assume categories are relatively fixed in quantity, as compared to say a logging event or banking transaction which is always increasing the amount of data.

JeffO · Answer 2 · 2015-07-10 22:01:25Z

I agree with Snowman and think you should do all the queries. Along with performance being negligible, you can gain some insights about the data changes that may benefit you and/or users:

New Categories
Removed Categories
Categories that have changed.

Someone is eventually going to ask what happened. "We got a new file," won't be enough.

asked	1 year ago
viewed	222 times
active	1 year ago

current community

your communities

more stack exchange communities

Partial Update of Database Table

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged database sql or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Partial Update of Database Table

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged database sql or ask your own question.

Related

Hot Network Questions