Here's an API concept which could be useful for performance optimisation. It's an example of key-based cache expiry applied to a broader internet-wide context instead of the internal Memcached-style scenario it seems to be mostly used for.
I want to make API calls almost as cacheable as static images. Using a news/feed subscriber as an example, which we might poll hourly, the idea is to send a last-updated timestamp along with each topic (it could just as easily be a version number or checksum):
{
username: "Wendy",
topics: [{
name: "tv",
updated: 1357647954355
},
{
name: "movies",
updated: 1357648018817
},
{
name: "music",
updated: 1357648028264
}]
}
To be clear, this resource itself comes directly from the server every time and is not cached on the edge or by the client. It's our subsequent calls for topics that we can aggressively cache, thanks to the timestamp.
Assuming we want to sync all topics, we'd have "N" further calls to make in a naieve implementation (/topics/tv
etc). But because of the timestamp, we can construct a URL like /topics/tv/1357647954355.json
. The client usually doesn't make a call at all if it's already seen (and cached) the same version of that resource. Furthermore, even if it's new to the client, an edge cache (e.g. a reverse-proxy like Squid, Varnish, or service like Cloudflare) probably has seen it before, because some other user has probably opened the latest version of this topic already. So we still bypass the application server; the server only ever creates topic JSON once after the underlying resource has updated. So instead of N+1 calls to the server, the client probably makes a much smaller number of calls, and those calls will rarely hit the app server anyway.
Now for my question All this seems feasible and worth doing, but my question is if there's any prior art for this kind of thing and in particular, any HTTP standards to support it. I initially thought of conditional caching (ETags and modified dates), and I think they'd help to optimise this setup further, but I don't believe they are the answer. They are subtly different, because they require calls to be passed through to the application server in order to check something's changed. The idea here is the client saying "I already know the latest version, please send that resource back to me". I don't think there's any HTTP standard for it, which is why I propose a URL scheme like /topics/tv/1357647954355.json instead of some ETag-like header. I believe some CDNs work this way and it's surprising to me if there's no real HTTP standard around it.
Update: On reflection, an important special case of this is what a web browser does when it fetches a new HTML page. We know it will immediately be requesting CSS+JS, so the same versioning/timestamp trick can be used to ensure those static resources are cache-friendly. That this trick has not been formalised by the spec gives me confidence that unfortunately there is no HTTP standard for it. http://www.particletree.com/notebook/automatically-version-your-css-and-javascript-files/
/whatever.js?ver=12435
when thever=
is ignored but treated as a cache-invalidator by the client that doesn't know it's ignored. – Ross Patterson Jan 9 at 0:26