Add a whole bunch of basic documentation. I'm sure there's more to do,

but this is at least a start.
This commit is contained in:
Magnus Hagander
2010-06-14 21:40:15 +02:00
parent 820617359e
commit 1a8251e015
7 changed files with 302 additions and 0 deletions

6
docs/README.rst Normal file
View File

@ -0,0 +1,6 @@
Documentation
=============
This directory holds some basic documentation for the pgweb system. It
doesn't claim to be complete in any way, and any contributions to make
it so are always welcome!

20
docs/authentication.rst Normal file
View File

@ -0,0 +1,20 @@
Authentication
==============
The authentication system provides the base for the community login
system, as well as the django system. The functions defined in
sql/community_login.sql implement the community login system (existing
API) on top of the django authentication, as well as a function to
access all users defined in the old community login system.
The custom authentication provider pgweb.util.auth.AuthBackend
implements the community login system migration functionality. It will
first attempt to log the user in with the standard django system. If
this fails, it will attempt to log the user in with the *old*
community login system, and if this succeeds the user will
automatically be migrated to the django authentication system, and
removed from the old system.
In a local installation that does not have access to the existing set
of users, this authentication backend can be disabled completely, and
the system will function perfectly fine relying on just the django
authentication system.

44
docs/batch.rst Normal file
View File

@ -0,0 +1,44 @@
Batch jobs
==========
The system relies on a number of batch jobs that run on the webserver
or on another machine, feeding data into the system. These batch jobs
should generally be run under a user account that does *not* have write
permissions in the web directories - any exceptions should be clearly
noted here. Most of the jobs should run regularly from cron, some
should be run manually when required.
All batch jobs are located in the directory tools/.
docs/docload.py
---------------
This script will load a new set of documentation. Simply specify the
version to load and point out the tarball to load from. The script
will automatically decompress the tarball as necessary, and also
perform HTML tidying of the documentation (since the HTML generated by
the PostgreSQL build system isn't particularly standards-conforming or
nice-looking).
ftp/spider_ftp.py
-----------------
This script needs to be run on the *ftp server*, not on the
webserver. It will generate a python pickle file that should then be
copied to the webserver to serve as the basis for the ftp browser (so
it's not required to sync the entire ftp site to the webserver). The
copying of the pickle file is not included in the script.
moderation/moderation_report.py
-------------------------------
This script enumerates all unmoderated objects in the database and
generates an email to the slaves list if there are any pending, to
prod the moderators to do their job.
rss/fetch_rss_feeds.py
----------------------
This script will connect to all the RSS feeds registered in the RSS
application and fetch their articles into the database. It's not very
accepting of strange RSS feeds - it requires them to be "nicely
formatted". Usually that's not a problem since we only pull in the
headlines and not the contents. For a more complete RSS fetcher that
stores the data in a PostgreSQL database, see the "hamn" project that
powers planet.postgresql.org - also available on git.postgresql.org.

118
docs/django.rst Normal file
View File

@ -0,0 +1,118 @@
Django implementation
======================
The pgweb application is a fairly simple django application, since
there is no requirement for advanced logic anywhere on the site.
Actual coding practices are, as usual, not fully documented here. When
developing new functionality, please look at an existing application
in the pgweb/ directory that does something similar, and use the
coding practices used there.
Functions and classes should be documented in-line, through comments
or docstrings.
Database access
---------------
In all places where database access is simple, the django ORM is used
to access the data. In the few places where more advanced queries are
necessary, direct queries to the database are used. There exists no
intention to keep the database structure independent of database
used - it's all designed to use PostgreSQL. Therefor, using PostgreSQL
specific syntax in these direct queries is not a problem.
Module split
------------
The module split is not particularly strict, and there is a lot of
cross-referencing between the modules. This is expected...
Settings
--------
All settings should be listed including their default values in the
shipped settings.py. Modifications should always be made in the
settings_local.py file (which is in .gitignore) to make sure they're
not accidentally committed to the main repository, or cause merge conflicts.
Forms
-----
There are some special things to consider when dealing with forms. For
any objects that are going to be moderated, the Model that is used
should inherit from the PgModel model, instead of just the regular
django.db.models.Model. When this is done, the send_notification
attribute should be set to True. This will cause the system to
automatically send out notifications to the slaves list whenever a new
object is created or an existing one is modified.
If the form contains any text fields that accept markdown, the
attribute markdown_fields should be set to a tuple containing a list
of these fields. This will cause the system to automatically generate
preview boxes both in the admin interface (provided it's properly
registered) and on the regular forms.
If the model contains a field for "submitter", it will automatically
be filled in with the current user - be sure to exclude it from the
form itself.
Utilities
---------
The util/ subdirectory represents a set of utility functions and
classes, rather than an actual application. This is where common code
is put, that may be used between multiple modules.
pgweb.util.admin
++++++++++++++++
This module contains functionality to help simplify the admin.py
files. In particular, it contains a MarkdownPreviewAdmin class and a
register_markdown function, which are used to register a model to the
admin interface in a way that will make all text fields that are
listed as markdown capable have a preview box in the admin interface.
auth.py
+++++++
This module implements the community login provider for logging into
both the website itself and the admin interface.
bases.py
++++++++
This module implements base classes to inherit from. Specifically, it
implements the PgModel base class that is used to automatically
generate notifications.
contexts.py
+++++++++++
This module implements custom contexts, which is used to implement the
site navigation.
decorators.py
+++++++++++++
This module implements custom decorators used to change view
behavior. This includes decorator ssl_required that makes a view
require an SSL connection to work, and also nocache and cache
decorators that control how long a page can be cached by the frontend
servers.
helpers.py
++++++++++
This module implements helper functions and classes wrapping standard
django functionality to make for less coding, such as form, template
and XML management.
middleware.py
+++++++++++++
This module implements a custom django middleware, that will take care
of redirecting requests to SSL when required (this is controlled by
the decorator @require_ssl). It will also enable "standard" django
workarounds for getting access to the user who is currently executing
the request as part of thread local storage.
misc.py
+++++++
This module implements misc functions, for things like formatting
strings and sending email.
moderation.py
+++++++++++++
This module implements functions related to the moderation of
different objects in the system (objects that are submitted by
end-users and need to be approved before we show them on the website).

50
docs/frontend.rst Normal file
View File

@ -0,0 +1,50 @@
Frontend & Backend
==================
The postgresql.org website is designed to run in a frontend/backend
scenario. This is to achieve both load distribution and redundancy for
the case when the main webserver (known as wwwmaster) goes offline or
becomes loaded.
Previous versions of the website used static files on the frontend,
that were spidered at regular intervals and then push-rsynced out to
the frontends. This made the frontends entirely independent of the
backends, and as such very "available". Unfortunately it made a lot of
coding difficult, and had very bad performance (a re-spidering
including documentation and ftp would take more than 6 hours on a fast
machine).
This generation of the website will instead rely on a varnish web
cache running on the frontend servers, configured to cache things for
a long time. It will also run in what's known as "saint mode", which
will have varnish keep serving the content from the cache even if it
has expired in case the backend cannot be contacted. We also utilize
"grace mode", which has varnish send the cached version of a page
*while* it's fetching a new one from the backend.
All forms that require login will be processed directly by the master
server, just like before. These will *always* be processed over SSL,
and as such not sent through varnish at all. They will be accessed
under the domain wwwmaster.postgresql.org.
Requests that require *up to the second* content but do *not* require
a login, such as a mirror selection, will be sent through the
frontends (and served under the www.postgresql.org name) but without
caching enabled. Note that in most cases, these should actually be
cached for at least 5 or 10 seconds, to cut off any short term high
load effects (aka the slashdot effect).
Normal requests are always cached. There is a default cache expiry
that is set on all pages. There is a longer default policy set for
images, because they are considered never to change. Any view in the
django project can override this default, by specifying the
"Cache-control: s-maxage=xxx" http header on the response. This is
done by using the @cache() decorator on the view method. Caching
should be kept lower for pages that have frequently updating data,
such as the front page or the result of surveys page.
Finally, there is a form on the admin web interface that lets the
administrator manually expire specific pages using a varnish process
called *purging*. This will have the backend connect to each of the
frontend servers and tell them to remove specific objects (controlled
by a regular expression) right away, and fetch new objects from the
backend.

11
docs/navigation.rst Normal file
View File

@ -0,0 +1,11 @@
Navigation
==========
The navigation system is based on a django context called NavContext,
implemented in pgweb.util.contexts.NavContext. This means that all the
menu links in the system are defined in this file
(pgweb/utils/contexts.py). Each django view needs to specify the
NavContext in it's call to template rendering, and this will make the
correct nav menu show up.
This is one of the parts of the system that can probably be made a lot
easier, leaving much room for future improvement :-)

53
docs/overview.rst Normal file
View File

@ -0,0 +1,53 @@
Overview
========
Dynamic content
---------------
The dynamic content of the website is rendered using Django. The
django project is called "pgweb", and consists of a number of
applications under it. These applications are *not* designed to be
independent, and cross-referencing between them is both allowed and
normal. It should therefor not be expected that they will work if
copied outside the pgweb environment.
For more details about the django implementation, see the django.rst file.
Static HTML content
-------------------
For those pages that don't need any database access or other kinds of
logic, simple HTML templates are used. Any content here is edited as
plain HTML, and the django template engine is used to wrap this
content in the regular website framework.
All pages handled this way are stored in templates/pages/, with each
subdirectory mapping to a sub-url. The code for rendering these pages
is found in pgweb/core/views.py, function fallback().
Non-HTML content
----------------
Non-HTML content is stored in the media/ directory, which is served up
by django when run under the local webserver, but is expected to be
served up directly by the webserver when deployed in production. This
directory has subdirectories for images, css and javascript, as well
as some imported modules.
Note that there is also /adminmedia/, which is directly linked to the
django administrative interface media files, that are shipped with
django and not with pgweb.
Non-web content
---------------
Non-web content, such as PDF files and other static data, is handled
in it's own git repository, in order to keep the size of the main
repository down (since some of these files can be very large). The
repository is named pgweb-static.git, and also located on
git.postgresql.org. These files should be made visible through the
webserver at the /files/ url.
Batch jobs and integrations
---------------------------
There are a number of batch jobs expected to run on the server, in
order to fetch data from other locations. There are also jobs that
need to run on a different server, such as the ftp server, to push
information to the main server. For more details about these, see the
batch.rst file.