mirror of
https://github.com/apache/httpd.git
synced 2025-08-01 16:41:19 +00:00

git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/branches/2.4.x@1425939 13f79535-47bb-0310-9956-ffa450edef68
175 lines
7.9 KiB
XML
175 lines
7.9 KiB
XML
<?xml version="1.0"?>
|
|
<!DOCTYPE modulesynopsis SYSTEM "../style/modulesynopsis.dtd">
|
|
<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
|
|
<!-- $LastChangedRevision$ -->
|
|
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
-->
|
|
|
|
<modulesynopsis metafile="mod_xml2enc.xml.meta">
|
|
|
|
<name>mod_xml2enc</name>
|
|
<description>Enhanced charset/internationalisation support for libxml2-based
|
|
filter modules</description>
|
|
<status>Base</status>
|
|
<sourcefile>mod_xml2enc.c</sourcefile>
|
|
<identifier>xml2enc_module</identifier>
|
|
<compatibility>Version 2.4 and later. Available as a third-party module
|
|
for 2.2.x versions</compatibility>
|
|
|
|
<summary>
|
|
<p>This module provides enhanced internationalisation support for
|
|
markup-aware filter modules such as <module>mod_proxy_html</module>.
|
|
It can automatically detect the encoding of input data and ensure
|
|
they are correctly processed by the <a href="http://xmlsoft.org/"
|
|
>libxml2</a> parser, including converting to Unicode (UTF-8) where
|
|
necessary. It can also convert data to an encoding of choice
|
|
after markup processing, and will ensure the correct <var>charset</var>
|
|
value is set in the HTTP <var>Content-Type</var> header.</p>
|
|
</summary>
|
|
|
|
<section id="usage"><title>Usage</title>
|
|
<p>There are two usage scenarios: with modules programmed to work
|
|
with mod_xml2enc, and with those that are not aware of it:</p>
|
|
<dl>
|
|
<dt>Filter modules enabled for mod_xml2enc</dt><dd>
|
|
<p>Modules such as <module>mod_proxy_html</module> version 3.1
|
|
and up use the <code>xml2enc_charset</code> optional function to retrieve
|
|
the charset argument to pass to the libxml2 parser, and may use the
|
|
<code>xml2enc_filter</code> optional function to postprocess to another
|
|
encoding. Using mod_xml2enc with an enabled module, no configuration
|
|
is necessary: the other module will configure mod_xml2enc for you
|
|
(though you may still want to customise it using the configuration
|
|
directives below).</p>
|
|
</dd>
|
|
<dt>Non-enabled modules</dt><dd>
|
|
<p>To use it with a libxml2-based module that isn't explicitly enabled for
|
|
mod_xml2enc, you will have to configure the filter chain yourself.
|
|
So to use it with a filter <strong>foo</strong> provided by a module
|
|
<strong>mod_foo</strong> to improve the latter's i18n support with HTML
|
|
and XML, you could use</p>
|
|
<pre><code>
|
|
FilterProvider iconv xml2enc Content-Type $text/html
|
|
FilterProvider iconv xml2enc Content-Type $xml
|
|
FilterProvider markup foo Content-Type $text/html
|
|
FilterProvider markup foo Content-Type $xml
|
|
FilterChain iconv markup
|
|
</code></pre>
|
|
<p><strong>mod_foo</strong> will now support any character set supported by either
|
|
(or both) of libxml2 or apr_xlate/iconv.</p>
|
|
</dd></dl>
|
|
</section>
|
|
|
|
<section id="api"><title>Programming API</title>
|
|
<p>Programmers writing libxml2-based filter modules are encouraged to
|
|
enable them for mod_xml2enc, to provide strong i18n support for your
|
|
users without reinventing the wheel. The programming API is exposed in
|
|
<var>mod_xml2enc.h</var>, and a usage example is
|
|
<module>mod_proxy_html</module>.</p>
|
|
</section>
|
|
|
|
<section id="sniffing"><title>Detecting an Encoding</title>
|
|
<p>Unlike <module>mod_charset_lite</module>, mod_xml2enc is designed
|
|
to work with data whose encoding cannot be known in advance and thus
|
|
configured. It therefore uses 'sniffing' techniques to detect the
|
|
encoding of HTTP data as follows:</p>
|
|
<ol>
|
|
<li>If the HTTP <var>Content-Type</var> header includes a
|
|
<var>charset</var> parameter, that is used.</li>
|
|
<li>If the data start with an XML Byte Order Mark (BOM) or an
|
|
XML encoding declaration, that is used.</li>
|
|
<li>If an encoding is declared in an HTML <code><META></code>
|
|
element, that is used.</li>
|
|
<li>If none of the above match, the default value set by
|
|
<directive>xml2EncDefault</directive> is used.</li>
|
|
</ol>
|
|
<p>The rules are applied in order. As soon as a match is found,
|
|
it is used and detection is stopped.</p>
|
|
</section>
|
|
|
|
<section id="output"><title>Output Encoding</title>
|
|
<p><a href="http://xmlsoft.org/">libxml2</a> always uses UTF-8 (Unicode)
|
|
internally, and libxml2-based filter modules will output that by default.
|
|
mod_xml2enc can change the output encoding through the API, but there
|
|
is currently no way to configure that directly.</p>
|
|
<p>Changing the output encoding should (in theory, at least) never be
|
|
necessary, and is not recommended due to the extra processing load on
|
|
the server of an unnecessary conversion.</p>
|
|
</section>
|
|
|
|
<section id="alias"><title>Unsupported Encodings</title>
|
|
<p>If you are working with encodings that are not supported by any of
|
|
the conversion methods available on your platform, you can still alias
|
|
them to a supported encoding using <directive>xml2EncAlias</directive>.</p>
|
|
</section>
|
|
|
|
<directivesynopsis>
|
|
<name>xml2EncDefault</name>
|
|
<description>Sets a default encoding to assume when absolutely no information
|
|
can be <a href="#sniffing">automatically detected</a></description>
|
|
<syntax>xml2EncDefault <var>name</var></syntax>
|
|
<contextlist><context>server config</context>
|
|
<context>virtual host</context><context>directory</context>
|
|
<context>.htaccess</context></contextlist>
|
|
<compatibility>Version 2.4.0 and later; available as a third-party
|
|
module for earlier versions.</compatibility>
|
|
|
|
<usage>
|
|
<p>If you are processing data with known encoding but no encoding
|
|
information, you can set this default to help mod_xml2enc process
|
|
the data correctly. For example, to work with the default value
|
|
of Latin1 (<var>iso-8859-1</var> specified in HTTP/1.0, use</p>
|
|
<highlight language="config">xml2EncDefault iso-8859-1</highlight>
|
|
</usage>
|
|
</directivesynopsis>
|
|
|
|
<directivesynopsis>
|
|
<name>xml2EncAlias</name>
|
|
<description>Recognise Aliases for encoding values</description>
|
|
<syntax>xml2EncAlias <var>charset alias [alias ...]</var></syntax>
|
|
<contextlist><context>server config</context></contextlist>
|
|
|
|
<usage>
|
|
<p>This server-wide directive aliases one or more encoding to another
|
|
encoding. This enables encodings not recognised by libxml2 to be handled
|
|
internally by libxml2's encoding support using the translation table for
|
|
a recognised encoding. This serves two purposes: to support character sets
|
|
(or names) not recognised either by libxml2 or iconv, and to skip
|
|
conversion for an encoding where it is known to be unnecessary.</p>
|
|
</usage>
|
|
</directivesynopsis>
|
|
|
|
<directivesynopsis>
|
|
<name>xml2StartParse</name>
|
|
<description>Advise the parser to skip leading junk.</description>
|
|
<syntax>xml2StartParse <var>element [element ...]</var></syntax>
|
|
<contextlist><context>server config</context><context>virtual host</context>
|
|
<context>directory</context><context>.htaccess</context></contextlist>
|
|
|
|
<usage>
|
|
<p>Specify that the markup parser should start at the first instance
|
|
of any of the elements specified. This can be used as a workaround
|
|
where a broken backend inserts leading junk that messes up the parser (<a
|
|
href="http://bahumbug.wordpress.com/2006/10/12/mod_proxy_html-revisited/"
|
|
>example here</a>).</p>
|
|
<p>It should never be used for XML, nor well-formed HTML.</p>
|
|
</usage>
|
|
</directivesynopsis>
|
|
|
|
</modulesynopsis>
|
|
|