I want to design an API backed by some database (doesn't really matter which, but to make the discussion more interesting, let's say it's Mongo - explanation below) which sends data to a client.
The database contains several types of records. Some of them reference other types of records.
It's not uncommon for a record to be referenced by several other records from different types. Thus the data on the DB is normalized.
What are the considerations in designing an API server which sends records to the client?
Two options come to mind (you're invited to suggest more or correct me on those):
- The API is granular. Send normalized data. Let the client ask for more records based on what it receives. The client may have a cache, it may decide it doesn't need to ask the server for everything.
- The API sends all the records the client might possibly need based on the requested data. Thus effectively denormalizing the data.
With option 1, the client may make more HTTP requests so it can have the complete data it needs. It means more network communication, which may make the total data transfer slower. The server is simpler though, and the client can selectively ask only for the records it doesn't already have.
With option 2, less HTTP requests. But we may send the client data it already has (maybe it already received, and cached, some of the records in a previous request). The server is more complicated. Especially if it's not RDBMS. No joins in Mongo, so we have to query the DB more than once to get all the data.
Further assumptions:
- The data changes every few days (2-3 times a week). So the client can potentially have a persistent cache.
- The Mongo queries are a bit slow (millions of documents in each collection).
- In each such session about 2MB of data will be sent to the client.