HTTP caching for the masses

February 12, 2011 | categories: Python, Web, Programming, Plone | View Comments

Why HTTP caching?

  • your websites respond faster, your users become happier
  • improves Google ranking
  • reduces running costs, less CPU power needed

How it works

This article describes an alternative approach to HTTP caching with Plone. The way the module that I'm presenting here works is it:

  • hooks into a post publication event (IAfterPublicationEvent to be exact)
  • determines what content type is being published, is it a static image, a dynamic page?
  • looks up a caching policy that matches the type of content being serverd
  • applies the caching policy, which is just a bag of HTTP response headers

All in 136 lines of code. No components, no indirection, easy to grasp.

Since HTTP caching works on the HTTP level, it works pretty much the same everywhere. You should be able to quickly adjust this code to suit your Python framework of choice. And also your individual caching needs...

But, wait, confused about HTTP caching? Not sure what the difference between maxage and s-maxage is? Then do yourself a favour and head over to the caching tutorial at mnot.net. It's a must read if you're working with caching. And the good news is it's very well written, and caching is actually easy!

Download

You can download my caching module over at GitHub. What follows is an explanation of what it does in detail. This will allow you to understand the code and adjust it to your needs.

Understand the code

(If at this point you're still reading this in your RSS application, and you're not seeing syntax colouring, you might want to head over to my blog.)

The set_cache_headers function gives a good overview of what's happening:

@component.adapter(Interface, IAfterPublicationEvent)
def set_cache_headers(object, event):
    request = event.request
    response = request.response

    # If no caching policy was previously set, we'll choose one at this point:
    caching_policy = response.headers.get(CACHE_POLICY_HEADER)
    if caching_policy is None:
        caching_policy = _choose_caching_policy(object, request)
    if caching_policy:
        # Set a header on the response with the policy chosen
        response.setHeader(CACHE_POLICY_HEADER, caching_policy)

    # Here's where we actually set the cache headers:
    if caching_policy:
        caching_policies[caching_policy](response)

Note how function _choose_caching_policy is asked to determine a caching policy. What this function does is it introspects the object that's being published (we're talking Bobo here), and then it employs a simple if ... elif ... elif ... sequence to determine which caching policy is appropriate. An excerpt from _choose_caching_policy:

if portal_type == 'Plone Site' and request.response.status == 302:
    return 'No Cache' # don't cache redirects on the root
elif content_type.startswith('text/html'):
    return 'Cache HTML'
elif ...

If _choose_caching_policy returns a policy name, in set_cache_headers, we look up the policy function in the caching_policies dict and and call it:

if caching_policy:
    caching_policies[caching_policy](response)

The caching_policies dict is where we define all our policies:

caching_policies = {
    'Cache HTML':
    lambda response: _set_max_age(response, datetime.timedelta(days=-1),
                                  cache_ctrl={'s-maxage': '3600'}),
    'Cache Media Content':
    lambda response: _set_max_age(response, datetime.timedelta(hours=4)),
    'Cache Resource':
    lambda response: _set_max_age(response, datetime.timedelta(days=32),
                                  cache_ctrl={'public': None}),
    'No Cache':
    lambda response: _set_max_age(response, datetime.timedelta(days=-1)),
}

You can see how each caching policy delegates to another function called _set_max_age. This powerhouse of a caching subroutine computes the actual headers to be used and sets them on the response.

Take a closer look at Cache HTML to understand what this policy does:

_set_max_age(response, timedelta(days=-1), cache_ctrl={'s-maxage': '3600'})

This reads as:

Never cache this in the browser (timedelta(days=-1)), but do cache it in the proxy for one hour ({'s-maxage': '3600'}).

The Cache Resource policy sets the freshness to 32 days, and adds public to the Cache-Control header:

_set_max_age(response, timedelta(days=32), cache_ctrl={'public': None})

How to test it

We want to test two different things here:

  1. The response headers that we want are actually set. Our code works.
  2. The headers have the desired effect. Our theory works.

For (1) we will use functional tests. It's important to get (1) right before moving on to (2).

For (2) you should use tools like Page Speed or the Cacheability Query.

What follows is an example of a functional doctest that uses zope.testbrowser to test that the right headers are set, for (1).

Some convenience functions:

>>> import datetime, time
>>> def parse_expires(date_string):
...     return datetime.datetime(*
...         (time.strptime(date_string,
...          "%a, %d %b %Y %H:%M:%S GMT")[0:6]))
>>> def delta(date_string):
...     now = datetime.datetime.utcnow()
...     return parse_expires(date_string) - now

Check the caching policy and response headers that are set for folders:

>>> browser.open(portal['myfolder'].absolute_url())
>>> browser.headers['X-Caching-Policy']
'Cache HTML'
>>> browser.headers['Cache-Control']
'max-age=0,s-maxage=3600'
>>> d = delta(browser.headers['Expires'])
>>> (d.days, d.seconds) < (0, 0)
True

What headers are set for images that are stored in the CMS?:

>>> some_image = portal['images']['bar.jpg']
>>> browser.open(some_image.absolute_url())
>>> browser.headers['X-Caching-Policy']
'Cache Media Content'
>>> browser.headers['Cache-Control']
'max-age=14400'
>>> d = delta(browser.headers['Expires'])
>>> (d.days, d.seconds) > (0, 14000)
True

And lastly, we want static resources to be cached for a long time:

>>> browser.open(portal_url + '/++resource++myresources/logo.png')
>>> browser.headers['X-Caching-Policy']
'Cache Resource'
>>> browser.headers['Cache-Control']
'max-age=2764800,public'
>>> d = delta(browser.headers['Expires'])
>>> (d.days, d.seconds) > (30, 0)
True
>>> 'Last-Modified' in browser.headers
True