mirror of
https://github.com/apache/httpd.git
synced 2025-08-01 16:41:19 +00:00

git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1836060 13f79535-47bb-0310-9956-ffa450edef68
953 lines
41 KiB
XML
953 lines
41 KiB
XML
<?xml version="1.0" encoding="UTF-8" ?>
|
|
<!DOCTYPE manualpage SYSTEM "style/manualpage.dtd">
|
|
<?xml-stylesheet type="text/xsl" href="style/manual.en.xsl"?>
|
|
<!-- $LastChangedRevision$ -->
|
|
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
-->
|
|
|
|
<manualpage metafile="caching.xml.meta">
|
|
|
|
<title>Caching Guide</title>
|
|
|
|
<summary>
|
|
<p>This document supplements the <module>mod_cache</module>,
|
|
<module>mod_cache_disk</module>, <module>mod_file_cache</module> and <a
|
|
href="programs/htcacheclean.html">htcacheclean</a> reference documentation.
|
|
It describes how to use the Apache HTTP Server's caching features to accelerate web and
|
|
proxy serving, while avoiding common problems and misconfigurations.</p>
|
|
</summary>
|
|
|
|
<section id="introduction">
|
|
<title>Introduction</title>
|
|
|
|
<p>The Apache HTTP server offers a range of caching features that
|
|
are designed to improve the performance of the server in various
|
|
ways.</p>
|
|
|
|
<dl>
|
|
<dt>Three-state RFC2616 HTTP caching</dt>
|
|
<dd>
|
|
<module>mod_cache</module>
|
|
and its provider modules
|
|
<module>mod_cache_disk</module>
|
|
provide intelligent, HTTP-aware caching. The content itself is stored
|
|
in the cache, and mod_cache aims to honor all of the various HTTP
|
|
headers and options that control the cacheability of content
|
|
as described in
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html">Section
|
|
13 of RFC2616</a>.
|
|
<module>mod_cache</module>
|
|
is aimed at both simple and complex caching configurations, where
|
|
you are dealing with proxied content, dynamic local content or
|
|
have a need to speed up access to local files on a potentially
|
|
slow disk.
|
|
</dd>
|
|
|
|
<dt>Two-state key/value shared object caching</dt>
|
|
<dd>
|
|
The <a href="socache.html">shared object cache API</a> (socache)
|
|
and its provider modules provide a
|
|
server wide key/value based shared object cache. These modules
|
|
are designed to cache low level data such as SSL sessions and
|
|
authentication credentials. Backends allow the data to be stored
|
|
server wide in shared memory, or datacenter wide in a cache such
|
|
as memcache or distcache.
|
|
</dd>
|
|
|
|
<dt>Specialized file caching</dt>
|
|
<dd>
|
|
<module>mod_file_cache</module>
|
|
offers the ability to pre-load
|
|
files into memory on server startup, and can improve access
|
|
times and save file handles on files that are accessed often,
|
|
as there is no need to go to disk on each request.
|
|
</dd>
|
|
</dl>
|
|
|
|
<p>To get the most from this document, you should be familiar with
|
|
the basics of HTTP, and have read the Users' Guides to
|
|
<a href="urlmapping.html">Mapping URLs to the Filesystem</a> and
|
|
<a href="content-negotiation.html">Content negotiation</a>.</p>
|
|
|
|
</section>
|
|
|
|
<section id="http-caching">
|
|
|
|
<title>Three-state RFC2616 HTTP caching</title>
|
|
|
|
<related>
|
|
<modulelist>
|
|
<module>mod_cache</module>
|
|
<module>mod_cache_disk</module>
|
|
</modulelist>
|
|
<directivelist>
|
|
<directive module="mod_cache">CacheEnable</directive>
|
|
<directive module="mod_cache">CacheDisable</directive>
|
|
<directive module="core">UseCanonicalName</directive>
|
|
<directive module="mod_negotiation">CacheNegotiatedDocs</directive>
|
|
</directivelist>
|
|
</related>
|
|
|
|
<p>The HTTP protocol contains built in support for an in-line caching
|
|
mechanism
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html">
|
|
described by section 13 of RFC2616</a>, and the
|
|
<module>mod_cache</module> module can be used to take advantage of
|
|
this.</p>
|
|
|
|
<p>Unlike a simple two state key/value cache where the content
|
|
disappears completely when no longer fresh, an HTTP cache includes
|
|
a mechanism to retain stale content, and to ask the origin server
|
|
whether this stale content has changed and if not, make it fresh
|
|
again.</p>
|
|
|
|
<p>An entry in an HTTP cache exists in one of three states:</p>
|
|
|
|
<dl>
|
|
<dt>Fresh</dt>
|
|
<dd>
|
|
If the content is new enough (younger than its <strong>freshness
|
|
lifetime</strong>), it is considered <strong>fresh</strong>. An
|
|
HTTP cache is free to serve fresh content without making any
|
|
calls to the origin server at all.
|
|
</dd>
|
|
<dt>Stale</dt>
|
|
<dd>
|
|
<p>If the content is too old (older than its <strong>freshness
|
|
lifetime</strong>), it is considered <strong>stale</strong>. An
|
|
HTTP cache should contact the origin server and check whether
|
|
the content is still fresh before serving stale content to a
|
|
client. The origin server will either respond with replacement
|
|
content if not still valid, or ideally, the origin server will
|
|
respond with a code to tell the cache the content is still
|
|
fresh, without the need to generate or send the content again.
|
|
The content becomes fresh again and the cycle continues.</p>
|
|
|
|
<p>The HTTP protocol does allow the cache to serve stale data
|
|
under certain circumstances, such as when an attempt to freshen
|
|
the data with an origin server has failed with a 5xx error, or
|
|
when another request is already in the process of freshening
|
|
the given entry. In these cases a <code>Warning</code> header
|
|
is added to the response.</p>
|
|
</dd>
|
|
<dt>Non Existent</dt>
|
|
<dd>
|
|
If the cache gets full, it reserves the option to delete content
|
|
from the cache to make space. Content can be deleted at any time,
|
|
and can be stale or fresh. The <a
|
|
href="programs/htcacheclean.html">htcacheclean</a> tool can be
|
|
run on a once off basis, or deployed as a daemon to keep the size
|
|
of the cache within the given size, or the given number of inodes.
|
|
The tool attempts to delete stale content before attempting to
|
|
delete fresh content.
|
|
</dd>
|
|
</dl>
|
|
|
|
<p>Full details of how HTTP caching works can be found in
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html">
|
|
Section 13 of RFC2616</a>.</p>
|
|
|
|
<section>
|
|
<title>Interaction with the Server</title>
|
|
|
|
<p>The <module>mod_cache</module> module hooks into the server in two
|
|
possible places depending on the value of the
|
|
<directive module="mod_cache">CacheQuickHandler</directive> directive:
|
|
</p>
|
|
|
|
<dl>
|
|
<dt>Quick handler phase</dt>
|
|
<dd>
|
|
<p>This phase happens very early on during the request processing,
|
|
just after the request has been parsed. If the content is
|
|
found within the cache, it is served immediately and almost
|
|
all request processing is bypassed.</p>
|
|
|
|
<p>In this scenario, the cache behaves as if it has been "bolted
|
|
on" to the front of the server.</p>
|
|
|
|
<p>This mode offers the best performance, as the majority of
|
|
server processing is bypassed. This mode however also bypasses the
|
|
authentication and authorization phases of server processing, so
|
|
this mode should be chosen with care when this is important.</p>
|
|
|
|
<p> Requests with an "Authorization" header (for example, HTTP Basic
|
|
Authentication) are neither cacheable nor served from the cache
|
|
when <module>mod_cache</module> is running in this phase.</p>
|
|
</dd>
|
|
<dt>Normal handler phase</dt>
|
|
<dd>
|
|
<p>This phase happens late in the request processing, after all
|
|
the request phases have completed.</p>
|
|
|
|
<p>In this scenario, the cache behaves as if it has been "bolted
|
|
on" to the back of the server.</p>
|
|
|
|
<p>This mode offers the most flexibility, as the potential exists
|
|
for caching to occur at a precisely controlled point in the filter
|
|
chain, and cached content can be filtered or personalized before
|
|
being sent to the client.</p>
|
|
</dd>
|
|
</dl>
|
|
|
|
<p>If the URL is not found within the cache, <module>mod_cache</module>
|
|
will add a <a href="filter.html">filter</a> to the filter stack in order
|
|
to record the response to the cache, and then stand down, allowing normal
|
|
request processing to continue. If the content is determined to be
|
|
cacheable, the content will be saved to the cache for future serving,
|
|
otherwise the content will be ignored.</p>
|
|
|
|
<p>If the content found within the cache is stale, the
|
|
<module>mod_cache</module> module converts the request into a
|
|
<strong>conditional request</strong>. If the origin server responds with
|
|
a normal response, the normal response is cached, replacing the content
|
|
already cached. If the origin server responds with a 304 Not Modified
|
|
response, the content is marked as fresh again, and the cached content
|
|
is served by the filter instead of saving it.</p>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Improving Cache Hits</title>
|
|
|
|
<p>When a virtual host is known by one of many different server aliases,
|
|
ensuring that <directive module="core">UseCanonicalName</directive> is
|
|
set to <code>On</code> can dramatically improve the ratio of cache hits.
|
|
This is because the hostname of the virtual-host serving the content is
|
|
used within the cache key. With the setting set to <code>On</code>
|
|
virtual-hosts with multiple server names or aliases will not produce
|
|
differently cached entities, and instead content will be cached as
|
|
per the canonical hostname.</p>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Freshness Lifetime</title>
|
|
|
|
<p>Well formed content that is intended to be cached should declare an
|
|
explicit freshness lifetime with the <code>Cache-Control</code>
|
|
header's <code>max-age</code> or <code>s-maxage</code> fields, or
|
|
by including an <code>Expires</code> header.</p>
|
|
|
|
<p>At the same time, the origin server defined freshness lifetime can
|
|
be overridden by a client when the client presents their own
|
|
<code>Cache-Control</code> header within the request. In this case,
|
|
the lowest freshness lifetime between request and response wins.</p>
|
|
|
|
<p>When this freshness lifetime is missing from the request or the
|
|
response, a default freshness lifetime is applied. The default
|
|
freshness lifetime for cached entities is one hour, however
|
|
this can be easily over-ridden by using the <directive
|
|
module="mod_cache">CacheDefaultExpire</directive> directive.</p>
|
|
|
|
<p>If a response does not include an <code>Expires</code> header but does
|
|
include a <code>Last-Modified</code> header, <module>mod_cache</module>
|
|
can infer a freshness lifetime based on a heuristic, which can be
|
|
controlled through the use of the <directive
|
|
module="mod_cache">CacheLastModifiedFactor</directive> directive.</p>
|
|
|
|
<p>For local content, or for remote content that does not define its own
|
|
<code>Expires</code> header, <module>mod_expires</module> may be used to
|
|
fine-tune the freshness lifetime by adding <code>max-age</code> and
|
|
<code>Expires</code>.</p>
|
|
|
|
<p>The maximum freshness lifetime may also be controlled by using the
|
|
<directive module="mod_cache">CacheMaxExpire</directive>.</p>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>A Brief Guide to Conditional Requests</title>
|
|
|
|
<p>When content expires from the cache and becomes stale, rather than
|
|
pass on the original request, httpd will modify the request to make
|
|
it conditional instead.</p>
|
|
|
|
<p>When an <code>ETag</code> header exists in the original cached
|
|
response, <module>mod_cache</module> will add an
|
|
<code>If-None-Match</code> header to the request to the origin server.
|
|
When a <code>Last-Modified</code> header exists in the original
|
|
cached response, <module>mod_cache</module> will add an
|
|
<code>If-Modified-Since</code> header to the request to the origin
|
|
server. Performing either of these actions makes the request
|
|
<strong>conditional</strong>.</p>
|
|
|
|
<p>When a conditional request is received by an origin server, the
|
|
origin server should check whether the ETag or the Last-Modified
|
|
parameter has changed, as appropriate for the request. If not, the
|
|
origin should respond with a terse "304 Not Modified" response. This
|
|
signals to the cache that the stale content is still fresh should be
|
|
used for subsequent requests until the content's new freshness lifetime
|
|
is reached again.</p>
|
|
|
|
<p>If the content has changed, then the content is served as if the
|
|
request were not conditional to begin with.</p>
|
|
|
|
<p>Conditional requests offer two benefits. Firstly, when making such
|
|
a request to the origin server, if the content from the origin
|
|
matches the content in the cache, this can be determined easily and
|
|
without the overhead of transferring the entire resource.</p>
|
|
|
|
<p>Secondly, a well designed origin server will be designed in such
|
|
a way that conditional requests will be significantly cheaper to
|
|
produce than a full response. For static files, typically all that is
|
|
involved is a call to <code>stat()</code> or similar system call, to
|
|
see if the file has changed in size or modification time. As such, even
|
|
local content may still be served faster from the cache if it has not
|
|
changed.</p>
|
|
|
|
<p>Origin servers should make every effort to support conditional
|
|
requests as is practical, however if conditional requests are not
|
|
supported, the origin will respond as if the request was not
|
|
conditional, and the cache will respond as if the content had changed
|
|
and save the new content to the cache. In this case, the cache will
|
|
behave like a simple two state cache, where content is effectively
|
|
either fresh or deleted.</p>
|
|
</section>
|
|
|
|
<section>
|
|
<title>What Can be Cached?</title>
|
|
|
|
<p>The full definition of which responses can be cached by an HTTP
|
|
cache is defined in
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.4">
|
|
RFC2616 Section 13.4 Response Cacheability</a>, and can be summed up as
|
|
follows:</p>
|
|
|
|
<ol>
|
|
<li>Caching must be enabled for this URL. See the <directive
|
|
module="mod_cache">CacheEnable</directive> and <directive
|
|
module="mod_cache">CacheDisable</directive> directives.</li>
|
|
|
|
<li>If the response has an HTTP status code other than 200, 203, 300,
|
|
301 or 410 it must also specify an "Expires" or "Cache-Control" header.
|
|
</li>
|
|
|
|
<li>The request must be a HTTP GET request.</li>
|
|
|
|
<li>If the response contains an "Authorization:" header, it must
|
|
also contain an "s-maxage", "must-revalidate" or "public" option
|
|
in the "Cache-Control:" header, or it won't be cached.</li>
|
|
|
|
<li>If the URL included a query string (e.g. from a HTML form GET
|
|
method) it will not be cached unless the response specifies an
|
|
explicit expiration by including an "Expires:" header or the max-age
|
|
or s-maxage directive of the "Cache-Control:" header, as per RFC2616
|
|
sections 13.9 and 13.2.1.</li>
|
|
|
|
<li>If the response has a status of 200 (OK), the response must
|
|
also include at least one of the "Etag", "Last-Modified" or
|
|
the "Expires" headers, or the max-age or s-maxage directive of
|
|
the "Cache-Control:" header, unless the
|
|
<directive module="mod_cache">CacheIgnoreNoLastMod</directive>
|
|
directive has been used to require otherwise.</li>
|
|
|
|
<li>If the response includes the "private" option in a "Cache-Control:"
|
|
header, it will not be stored unless the
|
|
<directive module="mod_cache">CacheStorePrivate</directive> has been
|
|
used to require otherwise.</li>
|
|
|
|
<li>Likewise, if the response includes the "no-store" option in a
|
|
"Cache-Control:" header, it will not be stored unless the
|
|
<directive module="mod_cache">CacheStoreNoStore</directive> has been
|
|
used.</li>
|
|
|
|
<li>A response will not be stored if it includes a "Vary:" header
|
|
containing the match-all "*".</li>
|
|
</ol>
|
|
</section>
|
|
|
|
<section>
|
|
<title>What Should Not be Cached?</title>
|
|
|
|
<p>It should be up to the client creating the request, or the origin
|
|
server constructing the response to decide whether or not the content
|
|
should be cacheable or not by correctly setting the
|
|
<code>Cache-Control</code> header, and <module>mod_cache</module> should
|
|
be left alone to honor the wishes of the client or server as appropriate.
|
|
</p>
|
|
|
|
<p>Content that is time sensitive, or which varies depending on the
|
|
particulars of the request that are not covered by HTTP negotiation,
|
|
should not be cached. This content should declare itself uncacheable
|
|
using the <code>Cache-Control</code> header.</p>
|
|
|
|
<p>If content changes often, expressed by a freshness lifetime of minutes
|
|
or seconds, the content can still be cached, however it is highly
|
|
desirable that the origin server supports
|
|
<strong>conditional requests</strong> correctly to ensure that
|
|
full responses do not have to be generated on a regular basis.</p>
|
|
|
|
<p>Content that varies based on client provided request headers can be
|
|
cached through intelligent use of the <code>Vary</code> response
|
|
header.</p>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Variable/Negotiated Content</title>
|
|
|
|
<p>When the origin server is designed to respond with different content
|
|
based on the value of headers in the request, for example to serve
|
|
multiple languages at the same URL, HTTP's caching mechanism makes it
|
|
possible to cache multiple variants of the same page at the same URL.</p>
|
|
|
|
<p>This is done by the origin server adding a <code>Vary</code> header
|
|
to indicate which headers must be taken into account by a cache when
|
|
determining whether two variants are different from one another.</p>
|
|
|
|
<p>If for example, a response is received with a vary header such as;</p>
|
|
|
|
<example>
|
|
Vary: negotiate,accept-language,accept-charset
|
|
</example>
|
|
|
|
<p><module>mod_cache</module> will only serve the cached content to
|
|
requesters with accept-language and accept-charset headers
|
|
matching those of the original request.</p>
|
|
|
|
<p>Multiple variants of the content can be cached side by side,
|
|
<module>mod_cache</module> uses the <code>Vary</code> header and the
|
|
corresponding values of the request headers listed by <code>Vary</code>
|
|
to decide on which of many variants to return to the client.</p>
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section id="examples">
|
|
|
|
<title>Cache Setup Examples</title>
|
|
|
|
<related>
|
|
<modulelist>
|
|
<module>mod_cache</module>
|
|
<module>mod_cache_disk</module>
|
|
<module>mod_cache_socache</module>
|
|
<module>mod_socache_memcache</module>
|
|
</modulelist>
|
|
<directivelist>
|
|
<directive module="mod_cache">CacheEnable</directive>
|
|
<directive module="mod_cache_disk">CacheRoot</directive>
|
|
<directive module="mod_cache_disk">CacheDirLevels</directive>
|
|
<directive module="mod_cache_disk">CacheDirLength</directive>
|
|
<directive module="mod_cache_socache">CacheSocache</directive>
|
|
</directivelist>
|
|
</related>
|
|
|
|
<section id="disk">
|
|
<title>Caching to Disk</title>
|
|
|
|
<p>The <module>mod_cache</module> module relies on specific backend store
|
|
implementations in order to manage the cache, and for caching to disk
|
|
<module>mod_cache_disk</module> is provided to support this.</p>
|
|
|
|
<p>Typically the module will be configured as so;</p>
|
|
|
|
<highlight language="config">
|
|
CacheRoot "/var/cache/apache/"
|
|
CacheEnable disk /
|
|
CacheDirLevels 2
|
|
CacheDirLength 1
|
|
</highlight>
|
|
|
|
<p>Importantly, as the cached files are locally stored, operating system
|
|
in-memory caching will typically be applied to their access also. So
|
|
although the files are stored on disk, if they are frequently accessed
|
|
it is likely the operating system will ensure that they are actually
|
|
served from memory.</p>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Understanding the Cache-Store</title>
|
|
|
|
<p>To store items in the cache, <module>mod_cache_disk</module> creates
|
|
a 22 character hash of the URL being requested. This hash incorporates
|
|
the hostname, protocol, port, path and any CGI arguments to the URL,
|
|
as well as elements defined by the Vary header to ensure that multiple
|
|
URLs do not collide with one another.</p>
|
|
|
|
<p>Each character may be any one of 64-different characters, which mean
|
|
that overall there are 64^22 possible hashes. For example, a URL might
|
|
be hashed to <code>xyTGxSMO2b68mBCykqkp1w</code>. This hash is used
|
|
as a prefix for the naming of the files specific to that URL within
|
|
the cache, however first it is split up into directories as per
|
|
the <directive module="mod_cache_disk">CacheDirLevels</directive> and
|
|
<directive module="mod_cache_disk">CacheDirLength</directive>
|
|
directives.</p>
|
|
|
|
<p><directive module="mod_cache_disk">CacheDirLevels</directive>
|
|
specifies how many levels of subdirectory there should be, and
|
|
<directive module="mod_cache_disk">CacheDirLength</directive>
|
|
specifies how many characters should be in each directory. With
|
|
the example settings given above, the hash would be turned into
|
|
a filename prefix as
|
|
<code>/var/cache/apache/x/y/TGxSMO2b68mBCykqkp1w</code>.</p>
|
|
|
|
<p>The overall aim of this technique is to reduce the number of
|
|
subdirectories or files that may be in a particular directory,
|
|
as most file-systems slow down as this number increases. With
|
|
setting of "1" for
|
|
<directive module="mod_cache_disk">CacheDirLength</directive>
|
|
there can at most be 64 subdirectories at any particular level.
|
|
With a setting of 2 there can be 64 * 64 subdirectories, and so on.
|
|
Unless you have a good reason not to, using a setting of "1"
|
|
for <directive module="mod_cache_disk">CacheDirLength</directive>
|
|
is recommended.</p>
|
|
|
|
<p>Setting
|
|
<directive module="mod_cache_disk">CacheDirLevels</directive>
|
|
depends on how many files you anticipate to store in the cache.
|
|
With the setting of "2" used in the above example, a grand
|
|
total of 4096 subdirectories can ultimately be created. With
|
|
1 million files cached, this works out at roughly 245 cached
|
|
URLs per directory.</p>
|
|
|
|
<p>Each URL uses at least two files in the cache-store. Typically
|
|
there is a ".header" file, which includes meta-information about
|
|
the URL, such as when it is due to expire and a ".data" file
|
|
which is a verbatim copy of the content to be served.</p>
|
|
|
|
<p>In the case of a content negotiated via the "Vary" header, a
|
|
".vary" directory will be created for the URL in question. This
|
|
directory will have multiple ".data" files corresponding to the
|
|
differently negotiated content.</p>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Maintaining the Disk Cache</title>
|
|
|
|
<p>The <module>mod_cache_disk</module> module makes no attempt to
|
|
regulate the amount of disk space used by the cache, although it
|
|
will gracefully stand down on any disk error and behave as if the
|
|
cache was never present.</p>
|
|
|
|
<p>Instead, provided with httpd is the <a
|
|
href="programs/htcacheclean.html">htcacheclean</a> tool which allows you
|
|
to clean the cache periodically. Determining how frequently to run <a
|
|
href="programs/htcacheclean.html">htcacheclean</a> and what target size to
|
|
use for the cache is somewhat complex and trial and error may be needed to
|
|
select optimal values.</p>
|
|
|
|
<p><a href="programs/htcacheclean.html">htcacheclean</a> has two modes of
|
|
operation. It can be run as persistent daemon, or periodically from
|
|
cron. <a
|
|
href="programs/htcacheclean.html">htcacheclean</a> can take up to an hour
|
|
or more to process very large (tens of gigabytes) caches and if you are
|
|
running it from cron it is recommended that you determine how long a typical
|
|
run takes, to avoid running more than one instance at a time.</p>
|
|
|
|
<p>It is also recommended that an appropriate "nice" level is chosen for
|
|
htcacheclean so that the tool does not cause excessive disk io while the
|
|
server is running.</p>
|
|
|
|
<p class="figure">
|
|
<img src="images/caching_fig1.gif" alt="" width="600"
|
|
height="406" /><br />
|
|
<a id="figure1" name="figure1"><dfn>Figure 1</dfn></a>: Typical
|
|
cache growth / clean sequence.</p>
|
|
|
|
<p>Because <module>mod_cache_disk</module> does not itself pay attention
|
|
to how much space is used you should ensure that
|
|
<a href="programs/htcacheclean.html">htcacheclean</a> is configured to
|
|
leave enough "grow room" following a clean.</p>
|
|
</section>
|
|
|
|
<section id="memcache">
|
|
<title>Caching to memcached</title>
|
|
|
|
<p>Using the <module>mod_cache_socache</module> module, <module>mod_cache</module>
|
|
can cache data from a variety of implementations (aka: "providers"). Using the
|
|
<module>mod_socache_memcache</module> module, for example, one can specify that
|
|
<a href="http://memcached.org">memcached</a> is to be used as the
|
|
the backend storage mechanism.</p>
|
|
|
|
<p>Typically the module will be configured as so:</p>
|
|
|
|
<highlight language="config">
|
|
CacheEnable socache /
|
|
CacheSocache memcache:memcd.example.com:11211
|
|
</highlight>
|
|
|
|
<p>Additional <code>memcached</code> servers can be specified by
|
|
appending them to the end of the <code>CacheSocache memcache:</code>
|
|
line separated by commas:</p>
|
|
|
|
<highlight language="config">
|
|
CacheEnable socache /
|
|
CacheSocache memcache:mem1.example.com:11211,mem2.example.com:11212
|
|
</highlight>
|
|
|
|
<p>This format is also used with the other various <module>mod_cache_socache</module>
|
|
providers. For example:</p>
|
|
|
|
<highlight language="config">
|
|
CacheEnable socache /
|
|
CacheSocache shmcb:/path/to/datafile(512000)
|
|
</highlight>
|
|
|
|
<highlight language="config">
|
|
CacheEnable socache /
|
|
CacheSocache dbm:/path/to/datafile
|
|
</highlight>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section id="socache-caching">
|
|
|
|
<title>General Two-state Key/Value Shared Object Caching</title>
|
|
|
|
<related>
|
|
<modulelist>
|
|
<module>mod_authn_socache</module>
|
|
<module>mod_socache_dbm</module>
|
|
<module>mod_socache_dc</module>
|
|
<module>mod_socache_memcache</module>
|
|
<module>mod_socache_shmcb</module>
|
|
<module>mod_ssl</module>
|
|
</modulelist>
|
|
<directivelist>
|
|
<directive module="mod_authn_socache">AuthnCacheSOCache</directive>
|
|
<directive module="mod_ssl">SSLSessionCache</directive>
|
|
<directive module="mod_ssl">SSLStaplingCache</directive>
|
|
</directivelist>
|
|
</related>
|
|
|
|
<p>The Apache HTTP server offers a low level shared object cache for
|
|
caching information such as SSL sessions, or authentication credentials,
|
|
within the <a href="socache.html">socache</a> interface.</p>
|
|
|
|
<p>Additional modules are provided for each implementation, offering the
|
|
following backends:</p>
|
|
|
|
<dl>
|
|
<dt><module>mod_socache_dbm</module></dt>
|
|
<dd>DBM based shared object cache.</dd>
|
|
<dt><module>mod_socache_dc</module></dt>
|
|
<dd>Distcache based shared object cache.</dd>
|
|
<dt><module>mod_socache_memcache</module></dt>
|
|
<dd>Memcache based shared object cache.</dd>
|
|
<dt><module>mod_socache_shmcb</module></dt>
|
|
<dd>Shared memory based shared object cache.</dd>
|
|
</dl>
|
|
|
|
<section id="mod_authn_socache-caching">
|
|
<title>Caching Authentication Credentials</title>
|
|
|
|
<related>
|
|
<modulelist>
|
|
<module>mod_authn_socache</module>
|
|
</modulelist>
|
|
<directivelist>
|
|
<directive module="mod_authn_socache">AuthnCacheSOCache</directive>
|
|
</directivelist>
|
|
</related>
|
|
|
|
<p>The <module>mod_authn_socache</module> module allows the result of
|
|
authentication to be cached, relieving load on authentication backends.</p>
|
|
|
|
</section>
|
|
|
|
<section id="mod_ssl-caching">
|
|
<title>Caching SSL Sessions</title>
|
|
|
|
<related>
|
|
<modulelist>
|
|
<module>mod_ssl</module>
|
|
</modulelist>
|
|
<directivelist>
|
|
<directive module="mod_ssl">SSLSessionCache</directive>
|
|
<directive module="mod_ssl">SSLStaplingCache</directive>
|
|
</directivelist>
|
|
</related>
|
|
|
|
<p>The <module>mod_ssl</module> module uses the <code>socache</code> interface
|
|
to provide a session cache and a stapling cache.</p>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section id="file-caching">
|
|
|
|
<title>Specialized File Caching</title>
|
|
|
|
<related>
|
|
<modulelist>
|
|
<module>mod_file_cache</module>
|
|
</modulelist>
|
|
<directivelist>
|
|
<directive module="mod_file_cache">CacheFile</directive>
|
|
<directive module="mod_file_cache">MMapFile</directive>
|
|
</directivelist>
|
|
</related>
|
|
|
|
<p>On platforms where a filesystem might be slow, or where file
|
|
handles are expensive, the option exists to pre-load files into
|
|
memory on startup.</p>
|
|
|
|
<p>On systems where opening files is slow, the option exists to
|
|
open the file on startup and cache the file handle. These
|
|
options can help on systems where access to static files is
|
|
slow.</p>
|
|
|
|
<section id="filehandle">
|
|
<title>File-Handle Caching</title>
|
|
|
|
<p>The act of opening a file can itself be a source of delay, particularly
|
|
on network filesystems. By maintaining a cache of open file descriptors
|
|
for commonly served files, httpd can avoid this delay. Currently httpd
|
|
provides one implementation of File-Handle Caching.</p>
|
|
|
|
<section>
|
|
<title>CacheFile</title>
|
|
|
|
<p>The most basic form of caching present in httpd is the file-handle
|
|
caching provided by <module>mod_file_cache</module>. Rather than caching
|
|
file-contents, this cache maintains a table of open file descriptors. Files
|
|
to be cached in this manner are specified in the configuration file using
|
|
the <directive module="mod_file_cache">CacheFile</directive>
|
|
directive.</p>
|
|
|
|
<p>The
|
|
<directive module="mod_file_cache">CacheFile</directive> directive
|
|
instructs httpd to open the file when it is started and to re-use
|
|
this file-handle for all subsequent access to this file.</p>
|
|
|
|
<highlight language="config">
|
|
CacheFile /usr/local/apache2/htdocs/index.html
|
|
</highlight>
|
|
|
|
<p>If you intend to cache a large number of files in this manner, you
|
|
must ensure that your operating system's limit for the number of open
|
|
files is set appropriately.</p>
|
|
|
|
<p>Although using <directive module="mod_file_cache">CacheFile</directive>
|
|
does not cause the file-contents to be cached per-se, it does mean
|
|
that if the file changes while httpd is running these changes will
|
|
not be picked up. The file will be consistently served as it was
|
|
when httpd was started.</p>
|
|
|
|
<p>If the file is removed while httpd is running, it will continue
|
|
to maintain an open file descriptor and serve the file as it was when
|
|
httpd was started. This usually also means that although the file
|
|
will have been deleted, and not show up on the filesystem, extra free
|
|
space will not be recovered until httpd is stopped and the file
|
|
descriptor closed.</p>
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section id="inmemory">
|
|
<title>In-Memory Caching</title>
|
|
|
|
<p>Serving directly from system memory is universally the fastest method
|
|
of serving content. Reading files from a disk controller or, even worse,
|
|
from a remote network is orders of magnitude slower. Disk controllers
|
|
usually involve physical processes, and network access is limited by
|
|
your available bandwidth. Memory access on the other hand can take mere
|
|
nano-seconds.</p>
|
|
|
|
<p>System memory isn't cheap though, byte for byte it's by far the most
|
|
expensive type of storage and it's important to ensure that it is used
|
|
efficiently. By caching files in memory you decrease the amount of
|
|
memory available on the system. As we'll see, in the case of operating
|
|
system caching, this is not so much of an issue, but when using
|
|
httpd's own in-memory caching it is important to make sure that you
|
|
do not allocate too much memory to a cache. Otherwise the system
|
|
will be forced to swap out memory, which will likely degrade
|
|
performance.</p>
|
|
|
|
<section>
|
|
<title>Operating System Caching</title>
|
|
|
|
<p>Almost all modern operating systems cache file-data in memory managed
|
|
directly by the kernel. This is a powerful feature, and for the most
|
|
part operating systems get it right. For example, on Linux, let's look at
|
|
the difference in the time it takes to read a file for the first time
|
|
and the second time;</p>
|
|
|
|
<example><pre>
|
|
colm@coroebus:~$ time cat testfile > /dev/null
|
|
real 0m0.065s
|
|
user 0m0.000s
|
|
sys 0m0.001s
|
|
colm@coroebus:~$ time cat testfile > /dev/null
|
|
real 0m0.003s
|
|
user 0m0.003s
|
|
sys 0m0.000s</pre>
|
|
</example>
|
|
|
|
<p>Even for this small file, there is a huge difference in the amount
|
|
of time it takes to read the file. This is because the kernel has cached
|
|
the file contents in memory.</p>
|
|
|
|
<p>By ensuring there is "spare" memory on your system, you can ensure
|
|
that more and more file-contents will be stored in this cache. This
|
|
can be a very efficient means of in-memory caching, and involves no
|
|
extra configuration of httpd at all.</p>
|
|
|
|
<p>Additionally, because the operating system knows when files are
|
|
deleted or modified, it can automatically remove file contents from the
|
|
cache when necessary. This is a big advantage over httpd's in-memory
|
|
caching which has no way of knowing when a file has changed.</p>
|
|
</section>
|
|
|
|
<p>Despite the performance and advantages of automatic operating system
|
|
caching there are some circumstances in which in-memory caching may be
|
|
better performed by httpd.</p>
|
|
|
|
<section>
|
|
<title>MMapFile Caching</title>
|
|
|
|
<p><module>mod_file_cache</module> provides the
|
|
<directive module="mod_file_cache">MMapFile</directive> directive, which
|
|
allows you to have httpd map a static file's contents into memory at
|
|
start time (using the mmap system call). httpd will use the in-memory
|
|
contents for all subsequent accesses to this file.</p>
|
|
|
|
<highlight language="config">
|
|
MMapFile /usr/local/apache2/htdocs/index.html
|
|
</highlight>
|
|
|
|
<p>As with the
|
|
<directive module="mod_file_cache">CacheFile</directive> directive, any
|
|
changes in these files will not be picked up by httpd after it has
|
|
started.</p>
|
|
|
|
<p> The <directive module="mod_file_cache">MMapFile</directive>
|
|
directive does not keep track of how much memory it allocates, so
|
|
you must ensure not to over-use the directive. Each httpd child
|
|
process will replicate this memory, so it is critically important
|
|
to ensure that the files mapped are not so large as to cause the
|
|
system to swap memory.</p>
|
|
</section>
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section id="security">
|
|
<title>Security Considerations</title>
|
|
|
|
<section>
|
|
<title>Authorization and Access Control</title>
|
|
|
|
<p>Using <module>mod_cache</module> in its default state where
|
|
<directive module="mod_cache">CacheQuickHandler</directive> is set to
|
|
<code>On</code> is very much like having a caching reverse-proxy bolted
|
|
to the front of the server. Requests will be served by the caching module
|
|
unless it determines that the origin server should be queried just as an
|
|
external cache would, and this drastically changes the security model of
|
|
httpd.</p>
|
|
|
|
<p>As traversing a filesystem hierarchy to examine potential
|
|
<code>.htaccess</code> files would be a very expensive operation,
|
|
partially defeating the point of caching (to speed up requests),
|
|
<module>mod_cache</module> makes no decision about whether a cached
|
|
entity is authorised for serving. In other words; if
|
|
<module>mod_cache</module> has cached some content, it will be served
|
|
from the cache as long as that content has not expired.</p>
|
|
|
|
<p>If, for example, your configuration permits access to a resource by IP
|
|
address you should ensure that this content is not cached. You can do this
|
|
by using the <directive module="mod_cache">CacheDisable</directive>
|
|
directive, or <module>mod_expires</module>. Left unchecked,
|
|
<module>mod_cache</module> - very much like a reverse proxy - would cache
|
|
the content when served and then serve it to any client, on any IP
|
|
address.</p>
|
|
|
|
<p>When the <directive module="mod_cache">CacheQuickHandler</directive>
|
|
directive is set to <code>Off</code>, the full set of request processing
|
|
phases are executed and the security model remains unchanged.</p>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Local exploits</title>
|
|
|
|
<p>As requests to end-users can be served from the cache, the cache
|
|
itself can become a target for those wishing to deface or interfere with
|
|
content. It is important to bear in mind that the cache must at all
|
|
times be writable by the user which httpd is running as. This is in
|
|
stark contrast to the usually recommended situation of maintaining
|
|
all content unwritable by the Apache user.</p>
|
|
|
|
<p>If the Apache user is compromised, for example through a flaw in
|
|
a CGI process, it is possible that the cache may be targeted. When
|
|
using <module>mod_cache_disk</module>, it is relatively easy to
|
|
insert or modify a cached entity.</p>
|
|
|
|
<p>This presents a somewhat elevated risk in comparison to the other
|
|
types of attack it is possible to make as the Apache user. If you are
|
|
using <module>mod_cache_disk</module> you should bear this in mind -
|
|
ensure you upgrade httpd when security upgrades are announced and
|
|
run CGI processes as a non-Apache user using <a
|
|
href="suexec.html">suEXEC</a> if possible.</p>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Cache Poisoning</title>
|
|
|
|
<p>When running httpd as a caching proxy server, there is also the
|
|
potential for so-called cache poisoning. Cache Poisoning is a broad
|
|
term for attacks in which an attacker causes the proxy server to
|
|
retrieve incorrect (and usually undesirable) content from the origin
|
|
server.</p>
|
|
|
|
<p>For example if the DNS servers used by your system running httpd
|
|
are vulnerable to DNS cache poisoning, an attacker may be able to control
|
|
where httpd connects to when requesting content from the origin server.
|
|
Another example is so-called HTTP request-smuggling attacks.</p>
|
|
|
|
<p>This document is not the correct place for an in-depth discussion
|
|
of HTTP request smuggling (instead, try your favourite search engine)
|
|
however it is important to be aware that it is possible to make
|
|
a series of requests, and to exploit a vulnerability on an origin
|
|
webserver such that the attacker can entirely control the content
|
|
retrieved by the proxy.</p>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Denial of Service / Cachebusting</title>
|
|
|
|
<p>The Vary mechanism allows multiple variants of the same URL to be
|
|
cached side by side. Depending on header values provided by the client,
|
|
the cache will select the correct variant to return to the client. This
|
|
mechanism can become a problem when an attempt is made to vary on a
|
|
header that is known to contain a wide range of possible values under
|
|
normal use, for example the <code>User-Agent</code> header. Depending
|
|
on the popularity of the particular web site thousands or millions of
|
|
duplicate cache entries could be created for the same URL, crowding
|
|
out other entries in the cache.</p>
|
|
|
|
<p>In other cases, there may be a need to change the URL of a particular
|
|
resource on every request, usually by adding a "cachebuster" string to
|
|
the URL. If this content is declared cacheable by a server for a
|
|
significant freshness lifetime, these entries can crowd out
|
|
legitimate entries in a cache. While <module>mod_cache</module>
|
|
provides a
|
|
<directive module="mod_cache">CacheIgnoreURLSessionIdentifiers</directive>
|
|
directive, this directive should be used with care to ensure that
|
|
downstream proxy or browser caches aren't subjected to the same denial
|
|
of service issue.</p>
|
|
</section>
|
|
</section>
|
|
|
|
</manualpage>
|