Caching Tutorial
By Mark Nottingham
2007-12-22
How (and how not) to Control Caches
There are several tools that Web designers and Webmasters can use to
fine-tune how caches will treat their sites. It may require getting your hands
a little dirty with your server’s configuration, but the results are worth it.
For details on how to use these tools with your server, see the Implementation sections below.
HTML authors can put tags in a document’s <HEAD> section that
describe its attributes. These meta tags are often used in the
belief that they can mark a document as uncacheable, or expire it at a
certain time.
Meta tags are easy to use, but aren’t very effective. That’s because
they’re only honored by a few browser caches (which actually read the HTML),
not proxy caches (which almost never read the HTML in the document). While it
may be tempting to put a Pragma: no-cache meta tag into a Web page, it won’t
necessarily cause it to be kept fresh.
Side Note:
===============
If your site is hosted at an ISP or hosting farm and they
don’t give you the ability to set arbitrary HTTP headers (like Expires and
Cache-Control), complain loudly; these are tools necessary for doing your
job.
===============
On the other hand, true HTTP headers give you a lot of control
over how both browser caches and proxies handle your representations. They
can’t be seen in the HTML, and are usually automatically generated by the Web
server. However, you can control them to some degree, depending on the server
you use. In the following sections, you’ll see what HTTP headers are
interesting, and how to apply them to your site.
HTTP headers are sent by the server before the HTML, and only seen by the
browser and any intermediate caches. Typical HTTP 1.1 response headers might
look like this:
HTTP/1.1 200 OK
Date: Fri, 30 Oct 1998 13:19:41 GMT
Server: Apache/1.3.3 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT
ETag: "3e86-410-3596fbbc"
Content-Length: 1040
Content-Type: text/html
The HTML would follow these headers, separated by a blank
line. See the Implementation sections for information about how to set HTTP
headers.
Many people believe that assigning a Pragma: no-cache HTTP header to a
representation will make it uncacheable. This is not necessarily true; the
HTTP specification does not set any guidelines for Pragma response headers;
instead, Pragma request headers (the headers that a browser sends to a server)
are discussed. Although a few caches may honor this header, the majority
won’t, and it won’t have any effect. Use the headers below instead.
The Expires HTTP header is a basic means of controlling caches; it tells
all caches how long the associated representation is fresh for. After that
time, caches will always check back with the origin server to see if a
document is changed. Expires headers are supported by practically every
cache.
Most Web servers allow you to set Expires response headers in a number of
ways. Commonly, they will allow setting an absolute time to expire, a time
based on the last time that the client saw the representation (last access
time), or a time based on the last time the document changed on your
server (last modification time).
Expires headers are especially good for making static images (like
navigation bars and buttons) cacheable. Because they don’t change much, you
can set extremely long expiry time on them, making your site appear much more
responsive to your users. They’re also useful for controlling caching of a
page that is regularly changed. For instance, if you update a news page once a
day at 6am, you can set the representation to expire at that time, so caches
will know when to get a fresh copy, without users having to hit ‘reload’.
The only value valid in an Expires header is a HTTP date;
anything else will most likely be interpreted as ‘in the past’, so that the
representation is uncacheable. Also, remember that the time in a HTTP date is
Greenwich Mean Time (GMT), not local time.
For example:
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Side Note:
=================
It’s important to make sure that your Web
server’s clock is accurate if you use the Expires header.
One way to do this is using the Network Time
Protocol (NTP); talk to your local system administrator to find out
more.
=================
Although the Expires header is useful, it has some limitations. First,
because there’s a date involved, the clocks on the Web server and the cache
must be synchronised; if they have a different idea of the time, the intended
results won’t be achieved, and caches might wrongly consider stale content as
fresh.
Another problem with Expires is that it’s easy to forget that you’ve set
some content to expire at a particular time. If you don’t update an Expires
time before it passes, each and every request will go back to your Web server,
increasing load and latency.
HTTP 1.1 introduced a new class of headers, Cache-Control response
headers, to give Web publishers more control over their content, and
to address the limitations of Expires.
Useful Cache-Control response headers include:
max-age=[seconds] — specifies the maximum amount of
time that an representation will be considered fresh. Similar to Expires,
this directive is relative to the time of the request, rather than absolute.
[seconds] is the number of seconds from the time of the request you wish the
representation to be fresh for.s-maxage=[seconds] — similar to max-age, except that it
only applies to shared (e.g., proxy) caches.public — marks authenticated responses as cacheable;
normally, if HTTP authentication is required, responses are automatically
uncacheable.no-cache — forces caches to submit the request to the
origin server for validation before releasing a cached copy, every time.
This is useful to assure that authentication is respected (in combination
with public), or to maintain rigid freshness, without sacrificing all of the
benefits of caching.no-store — instructs caches not to keep a copy of the
representation under any conditions.must-revalidate — tells caches that they must obey any
freshness information you give them about a representation. HTTP allows
caches to serve stale representations under special conditions; by
specifying this header, you’re telling the cache that you want it to
strictly follow your rules.proxy-revalidate — similar to must-revalidate, except
that it only applies to proxy caches.
For example:
Cache-Control: max-age=3600, must-revalidate
If you plan to use the Cache-Control headers, you should have a look at
the excellent documentation in HTTP 1.1; see References and Further Information.
In How Web Caches Work, we said that validation is used
by servers and caches to communicate when an representation has changed. By
using it, caches avoid having to download the entire representation when they
already have a copy locally, but they’re not sure if it’s still fresh.
Validators are very important; if one isn’t present, and there isn’t any
freshness information (Expires or Cache-Control) available, caches will
not store a representation at all.
The most common validator is the time that the document last changed, as
communicated in Last-Modified header. When a cache has an
representation stored that includes a Last-Modified header, it can use it to
ask the server if the representation has changed since the last time it was
seen, with an If-Modified-Since request.
HTTP 1.1 introduced a new kind of validator called the ETag. ETags
are unique identifiers that are generated by the server and changed every time
the representation does. Because the server controls how the ETag is
generated, caches can be surer that if the ETag matches when they make a
If-None-Match request, the representation really is the same.
Almost all caches use Last-Modified times in determining if an
representation is fresh; ETag validation is also becoming prevalent.
Most modern Web servers will generate both ETag and Last-Modified
headers to use as validators for static content (i.e., files) automatically; you won’t have to
do anything. However, they don’t know enough about dynamic content (like CGI,
ASP or database sites) to generate them; see Writing
Cache-Aware Scripts.
Tutorial Pages:
»
What’s a Web Cache? Why do people use them?
»
Kinds of Web Caches
»
Aren’t Web Caches bad for me? Why should I help them?
»
How Web Caches Work
» How (and how not) to Control Caches
»
Tips for Building a Cache-Aware Site
»
Writing Cache-Aware Scripts
»
Frequently Asked Questions
»
Implementation Notes — Web Servers
»
Implementation Notes — Server-Side Scripting
»
References and Further Information
|
