Take the 2-minute tour ×
Webmasters Stack Exchange is a question and answer site for pro webmasters. It's 100% free, no registration required.

When a web page contains a single CSS file and an image, why do browsers and servers waste time with this traditional time-consuming route:

  1. browser sends an initial GET request for the webpage and waits for server response.
  2. browser sends another GET request for the css file and waits for server response.
  3. browser sends another GET request for the image file and waits for server response.

When instead they could use this short, direct, time-saving route?

  1. Browser sends a GET request for a web page.
  2. The web server responds with (index.html followed by style.css and image.jpg)
share|improve this question
1  
Any request cannot be made until the web page is fetched of course. After that, requests are made in order as the HTML is read. But this does not mean that only one request is made at a time. In fact, several requests are made but sometimes there are dependencies between requests and some have to be resolved before the page can be properly painted. Browsers do sometimes pause as a request is satisfied before appearing to handle other responses making it appear that each request is handled one at a time. The reality is more on the browser side as they tend to be resource intensive. –  closetnoc 2 days ago
11  
I'm surprised nobody mentioned caching. If I already have that file I don't need it sent to me. –  Corey Ogburn 2 days ago
2  
This list could be hundreds of things long. Although shorter than actually sending the files, it's still pretty far from an optimal solution. –  Corey Ogburn 2 days ago
1  
Actually, I have never visited a webpage that has more than 100 unique resources.. –  Ahmed Elsobky 2 days ago
1  
@AhmedElsoobky: the browser doesn't know what resources can be sent as cached-resources header without first retrieving the page itself. It also would be a privacy and security nightmare if retrieving a page tells the server that I have another page cached, which is possibly controlled by a different organization than the original page (a multi-tenants website). –  Lie Ryan 2 days ago

5 Answers 5

up vote 35 down vote accepted

The short answer is "Because HTTP wasn't designed for it".

Tim Berners-Lee did not design an efficient and extensible network protocol. His one design goal was simplicity. (The professor of my networking class in college said that he should have left the job to the professionals.) The problem that you outline is just one of the many problems with the HTTP protocol. In its original form:

  • There was no protocol version, just a request for a resource
  • There were no headers
  • Each request required a new TCP connection
  • There was no compression

The protocol was later revised to address many of these problems:

  • The requests were versioned, now requests look like GET /foo.html HTTP/1.1
  • Headers were added for meta information with both the request and response
  • Connections were allowed to be reused with Connection: keep-alive
  • Chunked responses were introduced to allow connections to be reused even when the document size is not known ahead of time.
  • Gzip compression was added

At this point HTTP has been taken about as far is can without breaking backwards compatibility.

You are not the first person to suggest that a page and all its resources should be pushed to the client. In fact, Google designed a protocol that can do so called SPDY.

Today both Chrome and Firefox can use SPDY instead of HTTP to servers that support it. From the SPDY website, its main features compared to HTTP are:

  • SPDY allows client and server to compress request and response headers, which cuts down on bandwidth usage when the similar headers (e.g. cookies) are sent over and over for multiple requests.
  • SPDY allows multiple, simultaneously multiplexed requests over a single connection, saving on round trips between client and server, and preventing low-priority resources from blocking higher-priority requests.
  • SPDY allows the server to actively push resources to the client that it knows the client will need (e.g. JavaScript and CSS files) without waiting for the client to request them, allowing the server to make efficient use of unutilized bandwidth.

If you want to serve your website with SPDY to browsers that support it, you can do so. For example Apache has mod_spdy.

share|improve this answer
1  
Dang good and informed answer! Web browsers are serial by nature and requests can be made rather rapidly. One look at a log file will show that requests for resources are made rather quickly once the HTML has been parsed. It is what it is. Not a bad system, just not as code/resource efficient as it could be. –  closetnoc 2 days ago
3  
Just for the record, SPDY is not the holy grail. It does some things well, but introduces other problems. Here is one article containing some points speaking agains SPDY. –  Jost yesterday
2  
I highly recommend that anyone interested in this read the criticisms in @Jost 's link. It gives you a hint of the complexity involved in figuring out how to do a very commonly implemented thing not just incrementally better but rather so much better that everyone starts using it. It's easy to think of an improvement that makes things somewhat better for a relatively large subset of use cases. To make things better in such a way that everyone starts using your new protocol because it's so much better that it's worth the cost of changing is another matter entirely, and not easy to do. –  msouth yesterday
4  
he should have left the job to the professionals: If he had done that, they would have taken six years to come up with a standard which would have been obsolete the day it came out, and soon a dozen competing standards would have appeared. Besides, did the professionals need permission from someone? Why didn't they do it themselves? –  Shantnu Tiwari yesterday
    
@ShantnuTiwari: The usual comparison is with TCP/IP v4, which disproves your FUD about professionals. That's so good that people use NAT just so they can avoid TCP/IP v6. However, professionals prefer to be paid for their work. Nobody ordered HTTP. Mr Berners-Lee was paid out of the CERN budget and did it pretty much on the side (HTTP isn't exactly quantum physics) –  MSalters yesterday

Because they do not know what those resources are. The assets a web page requires are coded into the HTML. Only after a parser determines what those assets are can the y be requested by the user-agent.

Additionally, once those assets are known, they need to be served individually so the proper headers (i.e. content-type) can be served so the user-agent knows how to handle it.

share|improve this answer
2  
Especially if you use something like require.js. The browser only asks for what it needs. Imagine having to load everything all at once... –  Aran Mulholland yesterday
1  
This is the right answer, and one that most of the commenters seem to be missing - in order for the server to proactively send the resources, it needs to know what they are, which means the server would have to parse the HTML. –  GalacticCowboy yesterday
1  
But the question asks why the web server does not send the resources, not why the client can't ask for them at the same time. It's very easy to imagine a world where servers have a package of related assets that are all sent together that does not rely on parsing HTML to build the package. –  David Meister 18 hours ago

Your web browser doesn't know about the additional resources until it downloads the web page (HTML) from the server, which contains the links to those resources.

You might be wondering, why doesn't the server just parse its own HTML and send all the additional resources to the web browser during the initial request for the web page? It's because the resources might be spread across multiple servers, and the web browser might not need all those resources since it already has some of them cached, or may not support them.

The web browser maintains a cache of resources so it does not have to download the same resources over and over from the servers that host them. When navigating different pages on a website that all use the same jQuery library, you don't want to download that library every time, just the first time.

So when the web browser gets a web page from the server, it checks what linked resources it DOESN'T already have in the cache, then makes additional HTTP requests for those resources. Pretty simple, very flexible and extensible.

A web browser can usually make two HTTP requests in parallel. This is not unlike AJAX - they are both asynchronous methods for loading web pages - asynchronous file loading and asynchronous content loading. With keep-alive, we can make several requests using one connection, and with pipelining we can make several requests without having to wait for responses. Both of these techniques are very fast because most overhead usually comes from opening/closing TCP connections:

keep-alive

pipelining

A bit of web history...

Web pages started as plain text email, with computer systems being engineered around this idea, forming a somewhat free-for-all communication platform; web servers were still proprietary at the time. Later, more layers were added to the "email spec" in the form of additional MIME types, such as images, styles, scripts, etc. After all, MIME stands for Multi-Purpose Internet Mail Extension. Sooner or later we had what is essentially multimedia email communication, standardized web servers, and web pages.

HTTP requires that data be transmitted in the context of email-like messages, although the data most often is not actually email.

As technology like this evolves, it needs to allow developers to progressively incorporate new features without breaking existing software. For example, when a new MIME type is added to the spec - let's say JPEG - it will take some time for web servers and web browsers to implement that. You don't just suddenly force JPEG into the spec and start sending it to all web browsers, you allow the web browser to request the resources that it supports, which keeps everyone happy and the technology moving forward. Does a screen reader need all the JPEGs on a web page? Probably not. Should you be forced to download a bunch of Javascript files if your device doesn't support Javascript? Probably not. Does Googlebot need to download all your Javascript files in order to index your site properly? Nope.

Source: I've developed an event-based web server like Node.js. It's called Rapid Server.

References:

share|improve this answer
    
Well, Actually, We can take care of all those side-problems (things like: Cache, Content-Type header..etc), there are Workarounds to solve these problems. And as I suggested in the comments on the post above, We can use something like this header> Cached-Resources: image.jpg; style.css; to solve the caching problem.. (If you do have time, then you can take a look at the comments above..) –  Ahmed Elsobky yesterday
    
Yes that idea had crossed my mind before, but it's simply too much overhead for HTTP and it doesn't solve the fact that resources may be spread across multiple servers. Furthermore, I don't think your proposed time-saving method would actually save time because data is going to be sent as a stream no matter how you look at it, and with keep-alive, 100 simultaneous HTTP requests essentially becomes 1 request. The technology and capability you propose seems to already exist in a way. See en.wikipedia.org/wiki/HTTP_persistent_connection –  perry yesterday
    
@perry: What would you think of the idea of an alternative to https:// for sending large publicly-distributed files that need to be authenticated but not kept confidential: include in the URL a hash of certain parts of a legitimate reply's header, which could in turn include either a signature or a hash of the data payload, and have browsers validate the received data against the header? Such a design would not only save some SSL handshake steps, but would more importantly allow caching proxies. Get the URL via an SSL link, and the data could be fed from anywhere. –  supercat 12 hours ago

HTTP2 is based on SPDY and does exactly what you suggest:

At a high level, HTTP/2:

  • is binary, instead of textual
  • is fully multiplexed, instead of ordered and blocking
  • can therefore use one connection for parallelism
  • uses header compression to reduce overhead
  • allows servers to “push” responses proactively into client caches

More is availible on HTTP 2 Faq

share|improve this answer

Because it doesn't assume that these things are actually required.

The protocol doesn't define any special handling for any particular type of file or user-agent. It does not know the difference between, say, an HTML file and a PNG image. In order to do what you're asking, the Web server would have to identify the file type, parse it out to figure out what other files it's referencing, and then determine which other files are actually needed, given what you intend to do with the file. There are three big problems with this.

The first problem is that there is no standard, robust way to identify file types on the server end. HTTP manages via the Content-Type mechanism, but that doesn't help the server, which has to figure this stuff out on its own (partly so that it knows what to put into the Content-Type). Filename extensions are widely supported, but fragile and easily-fooled, sometimes for malicious purposes. Filesystem metadata is less fragile, but most systems don't support it very well, so the servers don't even bother. Content sniffing (as some browsers and the Unix file command try to do) can be robust if you're willing to make it expensive, but robust sniffing is too expensive to be practical on the server side, and cheap sniffing isn't robust enough.

The second problem is that parsing a file is expensive, computationally-speaking. This ties into the first one somewhat, in that you'd need to parse the file in a bunch of different potential ways if you wanted to sniff the content robustly, but it also applies after you've identified the file type, because you need to figure out what the references are. This isn't so bad when you're only doing a few files at a time, like the browser does, but a Web server has to handle hundreds or thousands of requests at once. This adds up, and if it goes too far, it can actually slow things down more than multiple requests would. If you've ever visited a link from Slashdot or similar sites, only to find that the server is agonizingly slow due to high usage, you've seen this principle in action.

The third problem is that the server has no way to know what you intend to do with the file. A browser might need the files being referenced in the HTML, but it might not, depending on the exact context in which the file is being executed. That would be complex enough, but there's more to the Web than just browsers: between spiders, feed aggregators, and page-scraping mashups, there are many kinds of user-agents that have no need for the files being referenced in the HTML: they only care about the HTML itself. Sending these other files to such user-agents would only waste bandwidth.

The bottom line is that figuring out these dependencies on the server side is more trouble than it's worth. So instead, they let the client figure out what it needs.

share|improve this answer
    
If we are going to develop a new protocol or to fix an already existed one, We can take care of all these problems one way or another! And the web server will parse files for only one time and then it can classify them depending on defined rules so that it can prioritize which files to send first..etc and the web server doesn't have to know what I am intended to do with those files, It just have to know what to send, When to do and depending on which rules.. (web bots and spiders aren't a problem, the behavior will be different with them -they have unique user-agent headers-..) –  Ahmed Elsobky yesterday
    
@AhmedElsobky: What you're talking about sounds more like a specific implementation than a network protocol. But it really does have to know what you intend to do with the files before it can determine what to send: otherwise it will inevitably send files that many users don't want. You can't trust User-Agent strings, so you can't use them to determine what the user's intent is. –  The Spooniest yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.