There's no silver bullet
In practice it depends...
tl;dr - easy solution, use nginx...
Blocking:
For instance, Apache by default uses a blocking scheme where the process is forked for every connection. That means every connection needs its own memory space and the sheer amount of context-switching overhead increases more as the number of connections increases. But the benefit is, once a connection is closed the context can be disposed and any/all memory can be easily retrieved.
A multi-threaded approach would be similar in that the overhead of context switching increases with the number of connections but may be more memory efficient in a shared context. The problem with such an approach is it's difficult to manage shared memory in a manner that's safe. The approaches to overcome memory synchronization problems often include their own overhead, for instance locking may freeze the main thread on CPU-intensive loads, and using immutable types adds a lot of unnecessary copying of data.
AFAIK, using a multi-process approach on a blocking HTTP server is generally preferred because it's safer/simpler to manage/recovery memory in a manner that's safe. Garbage collection becomes a non-issue when recovering memory is as simple as stopping a process. For long-running processes (ie a daemon) that characteristic is especially important.
While context-switching overhead may seem insignificant with a small number of workers, the disadvantages become more relevant as the load scales up to hundreds-to-thousands of concurrent connections. At best, context switching scales O(n) to the number of workers present but in practice it's most-likely worse.
Where servers that use blocking may not be the ideal choice for IO heavy loads, they are ideal for CPU-intensive work and message passing is kept to a minumum.
Non-Blocking:
Non-blocking would be something like Node.js or nginx. These are especially known for scaling to a much larger number of connections per node under IO-intensive load. Basically, once people hit the upper limit of what thread/process-based servers could handle they started to explore alternative options. This is otherwise known as the C10K problem (ie the ability to handle 10,000 concurrent connections).
Non-blocking async servers generally shares a lot of characteristics with a multi-threaded-with-locking approach in that you have to be careful to avoid CPU-intensive loads because you don't want to overload the main thread. The advantage is that the overhead incurred by context switching is essentially eliminated and with only one context message passing becomes a non-issue.
While it may not work for many networking protocols, HTTPs stateless nature works especially well for non-blocking architectures. By using the combination of a reverse-proxy and multiple non-blocking HTTP servers it's possible to identify and route around the nodes experiencing heavy load.
Even on a server that only has one node, it's very common for the setup to include one server per processor core to maximize throughput.
Both:
The 'ideal' use case would be a combination of both. A reverse proxy at the front dedicated to routing requests at the top, then a mix of blocking and non-blocking servers. Non-blocking for IO tasks like serving static content, cache content, html content. Blocking for CPU-heavy tasks like encoding images/video, streaming content, number crunching, database writes, etc.
In your case:
If you're just checking headers but not actually processing the requests, what you're essentially describing is a reverse proxy. In such a case I'd definitely go with an async approach.
I'd suggest checking out the documentation for the nginx built-in reverse proxy.
Aside:
I read the write-up from the link you provided and it makes sense that async was a poor choice for their particular implementation. The issue can be summed up in one statement.
Found that when switching between clients, the code for saving and restoring values/state was difficult
They were building a state-ful platform. In such a case, an async approach would mean that you'd have to constantly save/load the state every time the context switches (ie when an event fires). In addition, on the SMTP side they're doing a lot of CPU-intensive work.
It sounds like they had a pretty poor grasp of async and, as a result, made a lot of bad assumptions.