How should data transfer between a client and a web API for normalized data be designed?

Question

I want to design an API backed by some database (doesn't really matter which, but to make the discussion more interesting, let's say it's Mongo - explanation below) which sends data to a client.

The database contains several types of records. Some of them reference other types of records.
It's not uncommon for a record to be referenced by several other records from different types. Thus the data on the DB is normalized.

What are the considerations in designing an API server which sends records to the client?

Two options come to mind (you're invited to suggest more or correct me on those):

The API is granular. Send normalized data. Let the client ask for more records based on what it receives. The client may have a cache, it may decide it doesn't need to ask the server for everything.
The API sends all the records the client might possibly need based on the requested data. Thus effectively denormalizing the data.

With option 1, the client may make more HTTP requests so it can have the complete data it needs. It means more network communication, which may make the total data transfer slower. The server is simpler though, and the client can selectively ask only for the records it doesn't already have.

With option 2, less HTTP requests. But we may send the client data it already has (maybe it already received, and cached, some of the records in a previous request). The server is more complicated. Especially if it's not RDBMS. No joins in Mongo, so we have to query the DB more than once to get all the data.

Further assumptions:

The data changes every few days (2-3 times a week). So the client can potentially have a persistent cache.
The Mongo queries are a bit slow (millions of documents in each collection).
In each such session about 2MB of data will be sent to the client.

Intriguing question! +1. – Randall Cook Jul 29 '14 at 6:18 — Randall Cook, Jul 29 '14 at 6:18

Randall Cook · Accepted Answer · 2014-07-29 06:17:22Z

Option 1

While it is tempting to have a set of lightweight APIs that send the normalized data, there are some potential pitfalls with this approach.

First, the lightness of these APIs might end up being paid for with tight coupling between your normalized data model and your APIs. If you want to change your data model or APIs, you will have to change the other or engineer a means of preserving the old approach on the other side of the change. This complicates maintenance.

Second, you are correct to note that there will be lots of HTTP calls from the client to your APIs. Put yourself in the role of the implementor of the client and ask whether you really want to make all those API calls. With proper error checking and exception handling, it gets to be a lot of work.

Option 2

On the other hand, sometimes a "one call does it all" approach is the best one. The "send everything" approach of option 2 can be a lot easier for the client, and due to its denormalized nature, provides a de facto interface between client and server that should be straightforward to maintain as the two develop and evolve separately. The price with this is speed and size. As you note, it could take a while to assemble all that data, then transmit it, especially if only a small portion of it is needed. But don't forget, it will be a lot faster to fetch all the data over the LAN in your data center than for the client to do so over the open internet.

Recommendation

I lean toward option 2, though I propose adding a dose of option 1 to strike a balance between simplicity and performance. If some data takes exceptionally longer than average to assemble, then see if you can leave it out of the main API and put it in a separate one. Remember, the goal is to make it easy on both the server and the client.

Caching Caveats

Since the data is changing every 2-3 days, be careful about caching. Since the data is so big and expensive to assemble, caching it tempting and probably a good idea. However, since it is changing regularly, be sure to refresh the cache and take steps to force the client to fetch new data when it is available. Techniques like cache-busting API parameters, observation times, expiration times, and the like could be of service here.

Thanks for the detailed answer! You have good inputs. About coupling between API and data - can you really make a generic API? Should you? I mean, as I see it, the API should follow the structure of your data; otherwise you are unnecessarily designing two different data schemas, one for the DB and one for the API output. — EyalAr, Jul 29 '14 at 8:18
About the number HTTP requests in options 1, let's say the API can provide datums as a batch. You give it a bunch od record IDs all at once. In this case, the number of HTTP requests can be reduced if the client asks for records as a batch in one call. — EyalAr, Jul 29 '14 at 8:20
Don't think of it as a generic API. Remember that the goal of an API is to provide an interface between two systems. It succeeds when it is optimized for facilitating the information or control transfer between those two systems, not necessarily when it is easy to program or follows an internal data model. Now don't go too far: it may be that the internal structure of your data is optimal for information transfer, in which case change nothing. But if it isn't, do what works best for the interface between the systems. — Randall Cook, Jul 29 '14 at 14:08
The batch model is different. This is sometimes called pagination, and in this case it is fine to bundle multiple instances of identically structured data into a list that is returned all at once. If that's the case with option 1, it looks more attractive. If it more of a network of related fields (like in a fully normalized RDBMS), then it is less attractive. — Randall Cook, Jul 29 '14 at 14:12

asked	8 months ago
viewed	89 times
active	8 months ago

current community

your communities

more stack exchange communities

How should data transfer between a client and a web API for normalized data be designed?

1 Answer 1

Option 1

Option 2

Recommendation

Caching Caveats

Your Answer

Not the answer you're looking for? Browse other questions tagged data-structures api-design web-api or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

How should data transfer between a client and a web API for normalized data be designed?

1 Answer 1

Option 1

Option 2

Recommendation

Caching Caveats

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged data-structures api-design web-api or ask your own question.

Related

Hot Network Questions