HomeDigital MarketingCache-Control directives, what they are and how they work

Cache-Control directives, what they are and how they work

What is cache

Index:

 

  • What is cache
  • There are multiple types of caches
  • HTTP Cache-Control header
  • Syntax
  • Cancel and update cached responses
  • Validating cached files with ETag
  • Caching strategies

What is cache

The cache is a system for reusing resources already downloaded by the browser , thus reducing the CPU and bandwidth consumption of the web server and speeding up the loading of pages.

Retrieving files from the network is time-consuming and entails energy costs (bandwidth used, CPU usage, disk usage, and so on). Often to generate even a simple web page, numerous HTTP calls are required between client and server, these calls affect the user experience, loading times and browser calculation times to process the page. Consequently, the ability to cache and reuse previously retrieved resources is a key aspect in optimizing a website’s performance.

Each browser has a built-in HTTP cache system, you just have to make sure that each web server response provides the correct HTTP header to tell the browser how, when and for how long the response can be cached .

There are multiple types of caches

A cached copy of a web page, or a CSS file for example, can exist in different levels of the web server> network> client path .

  • A cached copy can in fact exist at the web server level , for example using fastCGI, Varnish or other comparable solutions.
  • A cached copy can exist on a CDN, therefore in the network that connects the client to the server, for example using Cloudflare.
  • Finally, a cached copy may exist in the browser, which saves files and pages to speed up subsequent page views.

In this guide we talk about the HTTP Cache-Control header which advises the browser what to keep in memory, so we talk about a client-side cache .

HTTP Cache-Control header

HTTP headers allow the client and server to pass additional information with the request or response. An HTTP header consists of its name, non-case sensitive, followed by a colon ‘:’, then its value without line breaks. Leading whitespace before the value is ignored.

Have you ever seen something like this? Here, the following is an example of an HTTP header:

Any type of file extension can be cached: HTML pages, CSS files, Fonts, JavaScript and Images. The caching mode of each resource can be defined via the HTTP Cache-Control header . Cache-Control directives determine who can cache the response, under what conditions, and for how long.

Source: developers.google.com

From the point of view of optimizing performance and page loading speed, the best request is a request that does not need to communicate with the web server : a local copy of the response allows the browser to eliminate any network latency and reduces the data transferred from the web server. To achieve this, the HTTP specification allows the server to return Cache-Control directives that control how and for how long each individual response can be cached by the browser and other intermediate caches.

The Cache-Control header has been defined within the HTTP / 1.1 specification and replaces any previous header (eg Expires) used to define response caching policies. Cache-Control is supported by all current browsers; therefore we do not need anything else.

The Cache-Control general header field is then used to specify directives for caching mechanisms in both requests and responses . Caching directives are unidirectional in the sense that a particular directive in a request does not imply that the same directive must be given in the response.

Syntax

Directives are not case sensitive (they are not case sentitive) and have an optional argument, which can be used with both token and quoted string syntax. Multiple directives are separated by commas.

Cache request directives

Standard Cache-Control directives that can be used by the client in an HTTP request.

Cache-Control: max-age =

  • Specifies the maximum amount of time an asset will be considered fresh. Unlike Expires, this directive is relative to the time of the request. This directive specifies the maximum time expressed in seconds during which the retrieved response can be reused starting from the request; for example, “max-age = 60” indicates that the response can be cached and reused for the next 60 seconds.

Cache-Control: max-stale [=]

  • Indicates that the client is willing to accept a response that has passed its deadline. Optionally, you can assign a value in seconds, indicating the time within which the response must not have expired.

Cache-Control: min-fresh =

  • Indicates that the client wants a response that will still be updated for at least the specified number of seconds.

Cache-Control: no-cache

  • Forces the caches to send the request to the origin server for validation before releasing a cached copy. “No-cache” indicates that the received response cannot be used to satisfy a subsequent request for the same URL without first checking with the server if the response has changed. As a result, if a suitable validation token (ETag) is present, the no-cache will do a roundtrip to validate the cached response, but can cancel the download if the resource has not changed.

Cache-Control: no-store

  • The cache should not store anything on the client request or server response. Compared to “no-cache”, the “no-store” directive is much simpler, as it simply prevents the browser and all intermediate caches from storing any version of the response received, eg. containing sensitive or banking data. Whenever the user requests such activity, the request is sent to the server and a complete response is downloaded each time.

Cache-Control: no-transform

  • No transformation or conversion should be made to the resource. The Content-Encoding, Content-Range, Content-Type headers must not be changed by a proxy. For example, a non-transparent proxy might convert between image formats to save cache space or to reduce the amount of traffic on a slow link. The no-transform directive does not allow this.

Cache-Control: only-if-cached

  • Indicates not to retrieve new data. That being the case, the server wants the client to get a response only once and then the cache. From now on the client should continue to release a cached copy and avoid contacting the origin server to see if a new copy exists.

Cache response directives

Standard Cache-Control directives that can be used by the server in an HTTP response.

Cache-Control: must-revalidate

 

  • The cache must check the status of stale resources before using them and those should not be used.

Cache-Control: no-cache

  • Forces the caches to send the request to the origin server for validation before releasing a cached copy.

Cache-Control: no-store

  • The cache should not store anything on the client request or server response.

Cache-Control: no-transform

  • No transformation or conversion should be made to the resource. The Content-Encoding, Content-Range, Content-Type headers must not be changed by a proxy. For example, a non-transparent proxy might convert between image formats to save cache space or to reduce the amount of traffic on a slow link. The no-transform directive does not allow this.

Cache-Control: public

  • Indicates that the response could be cached by any cache. If the response is marked “public”, it can be cached even if it has HTTP authentication associated with it, and even when the response status code cannot normally be cached. Most of the time, “public” is not needed, because explicit caching of information (eg “max-age”) means that the response can still be cached.

Cache-Control: private

  • Indicates that the response is intended for a single user and should not be cached by a shared cache. A private cache can store the response. Unlike “public”, “private” responses can be cached by the browser but are usually addressed to a single user and therefore cannot be placed in an intermediate cache; for example, an HTML page with sensitive user information can be cached by the user’s browser, but not by a CDN.

Cache-Control: proxy-revalidate

  • Same as must-revalidate, but only applies to shared caches (eg Proxy) and is ignored by a private cache.

Cache-Control: max-age =

  • Specifies the maximum amount of time an asset will be considered fresh. Unlike Expires, this directive is relative to the time of the request. This directive specifies the maximum time expressed in seconds during which the retrieved response can be reused starting from the request; for example, “max-age = 60” indicates that the response can be cached and reused for the next 60 seconds.

Cache-Control: s-maxage =

  • Overrides the maximum age or Expires header, but only applies to shared caches (eg Proxy) and is ignored by a private cache.

Extension Cache-Control Directives

Extension Cache-Control directives are not part of the base document of the HTTP caching standards. Be sure to check the browser compatibility chart for their support.

Cache-Control: immutable

 

  • Indicates that the body of the response will not change over time. The resource, if not expired, is unchanged on the server and therefore the client does not have to send a conditional revalidation for it (e.g. If-None-Match or If-Modified-Since) to check for updates, even when the user explicitly updates page . Clients that are unaware of this extension should ignore them according to the HTTP specification. In Firefox, immutable is honored only for https: // transactions. For more information, also see this blog post.

Cache-Control: stale-while-revalidate =

  • Indicates that the client is willing to accept an outdated response while running the background check asynchronously for a new one. The seconds value indicates how long the client is willing to accept an outdated response.

Cache-Control: stale-if-error =

  • Indicates that the client is willing to accept an outdated response if checking for a new one fails. The seconds value indicates how long the client is willing to accept the stale response after the initial expiration.

Cancel and update cached responses

Once a resource is cached, it could theoretically be served from the cache forever. Caches have limited storage space , so items are periodically removed from storage – the process is called cache eviction . On the other hand, some resources may change on the server, so the cache should be updated. Since HTTP is a client-server protocol, servers cannot contact caches and clients when a resource changes; they must communicate an expiration time for the resource. Before this deadline, the asset is fresh; after expiration, the resource is out of date.

All HTTP requests from the browser are first sent to the browser cache to see if there is a valid response in it that can be used to satisfy the request. If there is a match, the response is read from the cache, eliminating both network latency and data transfer costs.

But what if we wanted to update or cancel a cached response? For example, suppose we have told our visitors to cache a CSS style sheet for up to 24 hours (max-age = 86400), but our webmaster has made an update that we want to make available to all users. How can we communicate to all visitors what is now an “obsolete” cached copy of our CSS in order to update their caches? This is a trick question: we can’t do that unless we change the URL of the resource.

Once cached by the browser, the version will be used as long as it is valid, as determined by max-age, or expires, or until it is cleared from the cache for some other reason, such as clearing the cache of the browser by the user. As a result, different users may find themselves using different versions of the file when the page is built; users who have just fetched the response will use the new version, while those who cached an older (but still valid) copy will use an older version of that response.

So how can we take full advantage of both worlds – client-side caching and updates? Simple: we can change the URL of the resource and force the user to download the new response every time the content changes. This is usually possible by including a file fingerprint or version number in the filename, eg. style.x234dff.css.

Source: developers.google.com

The ability to define caching policies for individual resources allows us to define “caching hierarchies” that allow us to control not only the retention times in the cache, but also how the user displays new versions. To illustrate this let’s look at the previous example:

HTML is marked as “no-cache”; which means that the browser must therefore always revalidate the document at each request and retrieve the latest version if the contents change. In addition, in the HTML markup, we have inserted fingerprints in the URLs for CSS and JavaScript: if the content of those files changes, then the HTML of the page will also change and a new copy of the HTML response will be downloaded.

CSS can be cached by browsers and intermediary caches (e.g., a CDN) and is set to expire at 1 year. Note that we can safely use 1 year “far future expires” because we have included a fingerprint of the file: if the CSS is updated, the URL also changes.

The JavaScript expiration is also set to 1 year, but is marked as “private”, possibly because it contains sensitive user data that the CDN cannot cache.

The image is cached without a unique version or fingerprint, with 1 day expiration.

The combination of ETag, Cache-Control and unique URLs allows us to offer the best from both sides: long expiration times, control over the response caching path and on-demand updates.

Validating cached files with ETag

The best way to manage expired file versions by forcing them to update is with the ETag header . The server uses the ETag HTTP header to communicate a validation token . The validation token allows for efficient verification of resource updating: no data is transferred if the resource has not changed. In practice, a token is assigned to each file, if the file is updated (eg a .css file) the token changes and the cache system will have to update it.

Suppose 120 seconds have passed since the recovery and the browser has sent a new request for the same resource. First of all the browser checks the local cache and locates the previous answer. Unfortunately, however, he cannot use it, as it has now expired. At this point, the browser can simply send a new request and retrieve the new response. However, this is inefficient, as if the response hasn’t changed, there’s no reason to re-download the same bytes already in the cache.

Source: developers.google.com

It is precisely to solve this type of problem that the validation tokens and specifically the ETag headers have been created for you . The server generates and sends a random token, usually a hash or fingerprint of the contents of the file. The client does not need to know how the fingerprint was generated, it just has to send it to the server at the next request: if the fingerprint is always the same, then the resource has not been modified, and it is therefore possible to skip the download.

In the above example, the client automatically supplies the ETag token in the “If-None-Match” HTTP request header; the server compares the token again with the current resource and, if this has not changed, sends a “304 Not Modified” response, indicating to the browser that the response it has cached has not changed and can be renewed for another 120 seconds. Note that there is no need to save the answer again, saving time and bandwidth.

As a web developer, how can you benefit from effective revalidation? The browser does all the work on our behalf: it automatically detects any validation token previously specified, adds it to an ongoing request, and updates the cache timestamp based on the response received from the server. The only thing left for us to do is make sure the server is actually providing the necessary ETag tokens – you can use an HTTP Header reader to test .

Caching strategies

Source: developers.google.com

To define your best caching strategy, you could follow this decision tree. The image can help you determine the optimal caching policy for a particular resource or set of resources used by the website. Ideally, you should try to store as many responses as possible on the client for as long as possible and provide validation tokens for each response to allow for efficient revalidation.

According to HTTP Archive , among the top 300,000 sites (according to Alexa rankings), the browser can store nearly half of all downloaded responses, a huge savings for repeat visits. Of course, that doesn’t mean your website can cache 50% of the resources. Some sites may cache more than 90% of their resources, while other sites may have a lot of private or time-sensitive data that can’t be cached.

Check your pages to identify which resources can be cached and make sure they return appropriate Cache-Control and ETag headers.

Please note that there is no better caching method than others. Depending on your traffic pattern, the type of data exchanged, and the specific application requirements for updating the data, you will need to define and configure the appropriate settings for each resource, in addition to the general “caching hierarchy”.

Some tips and techniques to keep in mind when defining your caching strategy:

  • Check your pages to identify which resources can be cached and make sure they return appropriate Cache-Control and ETag headers.
  • Use canonical and consistent URLs : If you serve the same content on different URLs, that content will be retrieved and stored multiple times. Remember that URLs are case sensitive .
  • Make sure the server provides a validation token (ETag) – with validation tokens you no longer need to transfer the same bytes if the resource on the server hasn’t changed. Identify resources that can be cached by intermediaries: Those with identical responses for all users are perfect for being cached by a CDN and other intermediaries.
  • Establish the optimal cache life for each resource – different resources may have different refresh needs. Check and establish the appropriate max-age value for each.
  • Establish the best caching hierarchy for your site: The combination of resource URLs, content fingerprints, and short caches or no caches for HTML documents allows you to control how often the client updates.
  • Minimize Downloads: Some resources are updated more frequently than others. If a certain part of a resource (e.g. a JavaScript function or some CSS style set) is updated frequently, consider sending that part of the code as a separate file . By doing this, code that doesn’t change very often can be retrieved from the cache, minimizing the content downloaded with each update.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments