postgres-web/docs/frontend.rst

Frontend & Backend
==================
The postgresql.org website is designed to run in a frontend/backend
scenario. This is to achieve both load distribution and redundancy for
the case when the main webserver (known as wwwmaster) goes offline or
becomes loaded.

Previous versions of the website used static files on the frontend,
that were spidered at regular intervals and then push-rsynced out to
the frontends. This made the frontends entirely independent of the
backends, and as such very "available". Unfortunately it made a lot of
coding difficult, and had very bad performance (a re-spidering
including documentation and ftp would take more than 6 hours on a fast
machine).

This generation of the website will instead rely on a varnish web
cache running on the frontend servers, configured to cache things for
a long time. It will also run in what's known as "grace mode", which
will have varnish keep serving the content from the cache even if it
has expired in case the backend cannot be contacted.

All forms that require login will be processed directly by the master
server, just like before. These will *always* be processed over SSL,
and as such not sent through varnish at all. They will still be
accessed using the domain www.postgresql.org, which will then simply
proxy the SSL connection to the backend. For the initial deployment
we'll just use HAProxy, but we may switch to a more feature-rich
proxy server in the future - in which case it's important to maintain
the encrypted channel between the frontend and the backend, since
they are normally not in the same datacenter.

Requests that require *up to the second* content but do *not* require
a login, such as a mirror selection, will be sent through the
frontends (and served under the www.postgresql.org name) but without
caching enabled. Note that in most cases, these should actually be
cached for at least 5 or 10 seconds, to cut off any short term high
load effects (aka the slashdot effect).

Normal requests are always cached. There is a default cache expiry
that is set on all pages. There is a longer default policy set for
images, because they are considered never to change. Any view in the
django project can override this default, by specifying the
"Cache-control: s-maxage=xxx" http header on the response. This is
done by using the @cache() decorator on the view method. Caching
should be kept lower for pages that have frequently updating data,
such as the front page or the survey results page.

Any model can define a tuple or a function
called *purge_urls* (if it's a function, it will be called and
should return a tuple or a generator). Each entry is a regular
expression, and this data will be automatically removed from the
frontend caches whenever this object changes. The regular expression
will always be prepended with ^, and should be rooted with /.
It should be made as restrictive as possible (for example, don't
purge "/" since that will remove everything from the cache completely).
This makes it possible to have frontends react instantly to changes,
while maintaining high cacheability.

Any model can also define a tuple or function called *purge_xkeys*.
This work exactly the same way except it purges based on xkey instead
of URL, which is usually the better choice.

Finally, there is a form on the admin web interface that lets the
administrator manually purge pages from the caches. This may be
necessary if changes have been made to static pages and/or site
structure that otherwise wouldn't show up until the cache has
expired.