Break out the tar parsing parts into a function for just that and create
a new one that knows how to load directly from the html directory in the
postgresql source tree, for more efficient snapshot loading.
This will require some further updates on the loading side of things
before it's fully valid, but for now track and show a link to the git
hash used to build developer docs *if* one is specified.
We only track it for devel (because releases have release numbers) and
we only show it in the cases where we would already show the loading
time.
Instead of printing every single page loaded, print the start of the
process and the statitics by default.
The existing --quiet parameter continues to work to make the process
completely quiet.
Add a new parameter --verbose that makes it run in the old way, printing
everything.
We already avoided the actual updates (by filtering the UPDATE
statement), but since we set the load date we'd trigger a change to
every page and kick it ouf of the caches even when not chagned, which is
wasteful. So instead only do that if something has changed. When it has,
we still reset that whole version of the docs since we want the load
date to be consistent across them.
Issuing individual INSERTs for each line in the docs works decently when
local, but it slow when loading across The Tubes Of The Internet.
Switching to using COPY takes the load time from the buildfarm animal
from just over 2 minutes to about 6-7 seconds.
It seems the newer tidy in Debian Stretch breaks with the output from
the old docs toolchain, causing indention to happen inside <pre> blocks
which clearly breaks rendering.
Turn it off for thos, but keep it enabled for version 11 and up (at this
point that's just dev), because the output becomes a lot easier to read
when trying to debug things.
This removes the dependency on django from docload, facilitating
incremental upgrades of the infrastructure.
This now requires a new docload.ini file in the tools/docs directory,
with a section "db" and a setting "dsn".
This makes it possible to figure out when the docs were actually
loaded, since developer docs don't carry a version number. This is
actually going to be the docs *load* timestamp, and not build timestamp,
but they should be close enough together that it shouldn't matter.
Fixes#108