From 1a8251e015f92d9cb3b5a6a24853be87626b979f Mon Sep 17 00:00:00 2001 From: Magnus Hagander Date: Mon, 14 Jun 2010 21:40:15 +0200 Subject: [PATCH] Add a whole bunch of basic documentation. I'm sure there's more to do, but this is at least a start. --- docs/README.rst | 6 ++ docs/authentication.rst | 20 +++++++ docs/batch.rst | 44 +++++++++++++++ docs/django.rst | 118 ++++++++++++++++++++++++++++++++++++++++ docs/frontend.rst | 50 +++++++++++++++++ docs/navigation.rst | 11 ++++ docs/overview.rst | 53 ++++++++++++++++++ 7 files changed, 302 insertions(+) create mode 100644 docs/README.rst create mode 100644 docs/authentication.rst create mode 100644 docs/batch.rst create mode 100644 docs/django.rst create mode 100644 docs/frontend.rst create mode 100644 docs/navigation.rst create mode 100644 docs/overview.rst diff --git a/docs/README.rst b/docs/README.rst new file mode 100644 index 00000000..f2bf70b4 --- /dev/null +++ b/docs/README.rst @@ -0,0 +1,6 @@ +Documentation +============= + +This directory holds some basic documentation for the pgweb system. It +doesn't claim to be complete in any way, and any contributions to make +it so are always welcome! diff --git a/docs/authentication.rst b/docs/authentication.rst new file mode 100644 index 00000000..4fc2291a --- /dev/null +++ b/docs/authentication.rst @@ -0,0 +1,20 @@ +Authentication +============== +The authentication system provides the base for the community login +system, as well as the django system. The functions defined in +sql/community_login.sql implement the community login system (existing +API) on top of the django authentication, as well as a function to +access all users defined in the old community login system. + +The custom authentication provider pgweb.util.auth.AuthBackend +implements the community login system migration functionality. It will +first attempt to log the user in with the standard django system. If +this fails, it will attempt to log the user in with the *old* +community login system, and if this succeeds the user will +automatically be migrated to the django authentication system, and +removed from the old system. + +In a local installation that does not have access to the existing set +of users, this authentication backend can be disabled completely, and +the system will function perfectly fine relying on just the django +authentication system. diff --git a/docs/batch.rst b/docs/batch.rst new file mode 100644 index 00000000..7b076fb2 --- /dev/null +++ b/docs/batch.rst @@ -0,0 +1,44 @@ +Batch jobs +========== + +The system relies on a number of batch jobs that run on the webserver +or on another machine, feeding data into the system. These batch jobs +should generally be run under a user account that does *not* have write +permissions in the web directories - any exceptions should be clearly +noted here. Most of the jobs should run regularly from cron, some +should be run manually when required. + +All batch jobs are located in the directory tools/. + +docs/docload.py +--------------- +This script will load a new set of documentation. Simply specify the +version to load and point out the tarball to load from. The script +will automatically decompress the tarball as necessary, and also +perform HTML tidying of the documentation (since the HTML generated by +the PostgreSQL build system isn't particularly standards-conforming or +nice-looking). + +ftp/spider_ftp.py +----------------- +This script needs to be run on the *ftp server*, not on the +webserver. It will generate a python pickle file that should then be +copied to the webserver to serve as the basis for the ftp browser (so +it's not required to sync the entire ftp site to the webserver). The +copying of the pickle file is not included in the script. + +moderation/moderation_report.py +------------------------------- +This script enumerates all unmoderated objects in the database and +generates an email to the slaves list if there are any pending, to +prod the moderators to do their job. + +rss/fetch_rss_feeds.py +---------------------- +This script will connect to all the RSS feeds registered in the RSS +application and fetch their articles into the database. It's not very +accepting of strange RSS feeds - it requires them to be "nicely +formatted". Usually that's not a problem since we only pull in the +headlines and not the contents. For a more complete RSS fetcher that +stores the data in a PostgreSQL database, see the "hamn" project that +powers planet.postgresql.org - also available on git.postgresql.org. diff --git a/docs/django.rst b/docs/django.rst new file mode 100644 index 00000000..9c8e25e8 --- /dev/null +++ b/docs/django.rst @@ -0,0 +1,118 @@ +Django implementation +====================== + +The pgweb application is a fairly simple django application, since +there is no requirement for advanced logic anywhere on the site. + +Actual coding practices are, as usual, not fully documented here. When +developing new functionality, please look at an existing application +in the pgweb/ directory that does something similar, and use the +coding practices used there. + +Functions and classes should be documented in-line, through comments +or docstrings. + + +Database access +--------------- +In all places where database access is simple, the django ORM is used +to access the data. In the few places where more advanced queries are +necessary, direct queries to the database are used. There exists no +intention to keep the database structure independent of database +used - it's all designed to use PostgreSQL. Therefor, using PostgreSQL +specific syntax in these direct queries is not a problem. + +Module split +------------ +The module split is not particularly strict, and there is a lot of +cross-referencing between the modules. This is expected... + +Settings +-------- +All settings should be listed including their default values in the +shipped settings.py. Modifications should always be made in the +settings_local.py file (which is in .gitignore) to make sure they're +not accidentally committed to the main repository, or cause merge conflicts. + +Forms +----- +There are some special things to consider when dealing with forms. For +any objects that are going to be moderated, the Model that is used +should inherit from the PgModel model, instead of just the regular +django.db.models.Model. When this is done, the send_notification +attribute should be set to True. This will cause the system to +automatically send out notifications to the slaves list whenever a new +object is created or an existing one is modified. + +If the form contains any text fields that accept markdown, the +attribute markdown_fields should be set to a tuple containing a list +of these fields. This will cause the system to automatically generate +preview boxes both in the admin interface (provided it's properly +registered) and on the regular forms. + +If the model contains a field for "submitter", it will automatically +be filled in with the current user - be sure to exclude it from the +form itself. + +Utilities +--------- +The util/ subdirectory represents a set of utility functions and +classes, rather than an actual application. This is where common code +is put, that may be used between multiple modules. + +pgweb.util.admin +++++++++++++++++ +This module contains functionality to help simplify the admin.py +files. In particular, it contains a MarkdownPreviewAdmin class and a +register_markdown function, which are used to register a model to the +admin interface in a way that will make all text fields that are +listed as markdown capable have a preview box in the admin interface. + +auth.py ++++++++ +This module implements the community login provider for logging into +both the website itself and the admin interface. + +bases.py +++++++++ +This module implements base classes to inherit from. Specifically, it +implements the PgModel base class that is used to automatically +generate notifications. + +contexts.py ++++++++++++ +This module implements custom contexts, which is used to implement the +site navigation. + +decorators.py ++++++++++++++ +This module implements custom decorators used to change view +behavior. This includes decorator ssl_required that makes a view +require an SSL connection to work, and also nocache and cache +decorators that control how long a page can be cached by the frontend +servers. + +helpers.py +++++++++++ +This module implements helper functions and classes wrapping standard +django functionality to make for less coding, such as form, template +and XML management. + +middleware.py ++++++++++++++ +This module implements a custom django middleware, that will take care +of redirecting requests to SSL when required (this is controlled by +the decorator @require_ssl). It will also enable "standard" django +workarounds for getting access to the user who is currently executing +the request as part of thread local storage. + +misc.py ++++++++ +This module implements misc functions, for things like formatting +strings and sending email. + +moderation.py ++++++++++++++ +This module implements functions related to the moderation of +different objects in the system (objects that are submitted by +end-users and need to be approved before we show them on the website). diff --git a/docs/frontend.rst b/docs/frontend.rst new file mode 100644 index 00000000..75c3af85 --- /dev/null +++ b/docs/frontend.rst @@ -0,0 +1,50 @@ +Frontend & Backend +================== +The postgresql.org website is designed to run in a frontend/backend +scenario. This is to achieve both load distribution and redundancy for +the case when the main webserver (known as wwwmaster) goes offline or +becomes loaded. + +Previous versions of the website used static files on the frontend, +that were spidered at regular intervals and then push-rsynced out to +the frontends. This made the frontends entirely independent of the +backends, and as such very "available". Unfortunately it made a lot of +coding difficult, and had very bad performance (a re-spidering +including documentation and ftp would take more than 6 hours on a fast +machine). + +This generation of the website will instead rely on a varnish web +cache running on the frontend servers, configured to cache things for +a long time. It will also run in what's known as "saint mode", which +will have varnish keep serving the content from the cache even if it +has expired in case the backend cannot be contacted. We also utilize +"grace mode", which has varnish send the cached version of a page +*while* it's fetching a new one from the backend. + +All forms that require login will be processed directly by the master +server, just like before. These will *always* be processed over SSL, +and as such not sent through varnish at all. They will be accessed +under the domain wwwmaster.postgresql.org. + +Requests that require *up to the second* content but do *not* require +a login, such as a mirror selection, will be sent through the +frontends (and served under the www.postgresql.org name) but without +caching enabled. Note that in most cases, these should actually be +cached for at least 5 or 10 seconds, to cut off any short term high +load effects (aka the slashdot effect). + +Normal requests are always cached. There is a default cache expiry +that is set on all pages. There is a longer default policy set for +images, because they are considered never to change. Any view in the +django project can override this default, by specifying the +"Cache-control: s-maxage=xxx" http header on the response. This is +done by using the @cache() decorator on the view method. Caching +should be kept lower for pages that have frequently updating data, +such as the front page or the result of surveys page. + +Finally, there is a form on the admin web interface that lets the +administrator manually expire specific pages using a varnish process +called *purging*. This will have the backend connect to each of the +frontend servers and tell them to remove specific objects (controlled +by a regular expression) right away, and fetch new objects from the +backend. diff --git a/docs/navigation.rst b/docs/navigation.rst new file mode 100644 index 00000000..d32ef0d4 --- /dev/null +++ b/docs/navigation.rst @@ -0,0 +1,11 @@ +Navigation +========== +The navigation system is based on a django context called NavContext, +implemented in pgweb.util.contexts.NavContext. This means that all the +menu links in the system are defined in this file +(pgweb/utils/contexts.py). Each django view needs to specify the +NavContext in it's call to template rendering, and this will make the +correct nav menu show up. + +This is one of the parts of the system that can probably be made a lot +easier, leaving much room for future improvement :-) diff --git a/docs/overview.rst b/docs/overview.rst new file mode 100644 index 00000000..19601632 --- /dev/null +++ b/docs/overview.rst @@ -0,0 +1,53 @@ +Overview +======== + +Dynamic content +--------------- +The dynamic content of the website is rendered using Django. The +django project is called "pgweb", and consists of a number of +applications under it. These applications are *not* designed to be +independent, and cross-referencing between them is both allowed and +normal. It should therefor not be expected that they will work if +copied outside the pgweb environment. + +For more details about the django implementation, see the django.rst file. + +Static HTML content +------------------- +For those pages that don't need any database access or other kinds of +logic, simple HTML templates are used. Any content here is edited as +plain HTML, and the django template engine is used to wrap this +content in the regular website framework. + +All pages handled this way are stored in templates/pages/, with each +subdirectory mapping to a sub-url. The code for rendering these pages +is found in pgweb/core/views.py, function fallback(). + +Non-HTML content +---------------- +Non-HTML content is stored in the media/ directory, which is served up +by django when run under the local webserver, but is expected to be +served up directly by the webserver when deployed in production. This +directory has subdirectories for images, css and javascript, as well +as some imported modules. + +Note that there is also /adminmedia/, which is directly linked to the +django administrative interface media files, that are shipped with +django and not with pgweb. + +Non-web content +--------------- +Non-web content, such as PDF files and other static data, is handled +in it's own git repository, in order to keep the size of the main +repository down (since some of these files can be very large). The +repository is named pgweb-static.git, and also located on +git.postgresql.org. These files should be made visible through the +webserver at the /files/ url. + +Batch jobs and integrations +--------------------------- +There are a number of batch jobs expected to run on the server, in +order to fetch data from other locations. There are also jobs that +need to run on a different server, such as the ftp server, to push +information to the main server. For more details about these, see the +batch.rst file.