Index: trunk/phase3/docs/databases/postgres.txt |
— | — | @@ -0,0 +1,101 @@ |
| 2 | +This document describes the state of Postgres support in MediaWiki. |
| 3 | + |
| 4 | + |
| 5 | +== Overview == |
| 6 | + |
| 7 | +Support for PostgreSQL has been available since version 1.7 |
| 8 | +of MediaWiki, and is fairly well maintained. The main code |
| 9 | +is very well integrated, while extensions are very hit and miss. |
| 10 | +Still, it is probably the most supported database after MySQL. |
| 11 | +Much of the work in making MediaWiki database-agnostic came |
| 12 | +about through the work of creating Postgres support. |
| 13 | + |
| 14 | + |
| 15 | +== Required versions == |
| 16 | + |
| 17 | +The current minimum version of PostgreSQL for MediaWiki is 8.1. |
| 18 | +It is expected that this will be raised to 8.3 at some point, |
| 19 | +as 8.1 and 8.2 are nearing end of life. |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +== Database schema == |
| 24 | + |
| 25 | +Postgres has its own schema file at maintenance/postgres/tables.sql. |
| 26 | + |
| 27 | +The goal is to keep this file as close as possible to the canonical |
| 28 | +schema at maintenance/tables.sql, but without copying over |
| 29 | +all the usage comments. General notes on the conversion: |
| 30 | + |
| 31 | +* The use of a true TIMESTAMP rather than the text string that |
| 32 | +MySQL uses is highly encouraged. There are still places in the |
| 33 | +code (especially extensions) which make assumptions about the |
| 34 | +textual nature of timestamp fields, but these can almost always |
| 35 | +be programmed around. |
| 36 | + |
| 37 | +* Although Postgres has a true BOOLEAN type, boolean columns |
| 38 | +are always mapped to SMALLINT, as the code does not always treat |
| 39 | +the column as a boolean (which is limited to accepting true, |
| 40 | +false, 0, 1, t, or f) |
| 41 | + |
| 42 | +* The default data type for all VARCHAR, CHAR, and VARBINARY |
| 43 | +columns should simply be TEXT. The only exception is |
| 44 | +when VARBINARY is used to store true binary data, such as |
| 45 | +the math_inputhash column, in which case BYTEA should be used. |
| 46 | + |
| 47 | +* All integer variants should generally be mapped to INTEGER. |
| 48 | +There is small-to-no advantage in using SMALLINT versus |
| 49 | +INTEGER in Postgres, and the possibility of running out of |
| 50 | +room outweighs such concerns. The columns that are BIGINT |
| 51 | +in other schemas should be INTEGER as well, as none of them |
| 52 | +so far are even remotely likely to reach the 32 billion |
| 53 | +limit of an INTEGER. |
| 54 | + |
| 55 | +* Blobs (blob, tinyblog, mediumblob) should be mapped to TEXT |
| 56 | +whenever possible, and to BYTEA if they are known to contain |
| 57 | +binary data. |
| 58 | + |
| 59 | +* All length modifiers on data types should be removed. If |
| 60 | +they are on an INTEGER, it's probably an error, and if on |
| 61 | +any text-based field, simply using TEXT is preferred. |
| 62 | + |
| 63 | +* Sequences should be explicitly named rather than using |
| 64 | +SERIAL, as the code can depend on having a specific name. |
| 65 | + |
| 66 | +* Foreign keys should be used when possible. This makes things |
| 67 | +both easier and harder in the code, but most of the major |
| 68 | +problems have now been overcome. Always add an explicit ON DELETE |
| 69 | +clause, and consider carefully what choice to use (all things |
| 70 | +considered, prefer CASCADE). |
| 71 | + |
| 72 | +* The use of CIDR should be done very carefully, because the code |
| 73 | +will sometimes want to store things such as an empty string or |
| 74 | +other non-IP value in the column. When in doubt, use TEXT. |
| 75 | + |
| 76 | +* Indexes should be created using the original MySQL tables.sql |
| 77 | +as a guide, but keeping in mind the ability of Postgres to use |
| 78 | +partial indexes, functional indexes, and bitmaps. The index names |
| 79 | +should be logical but are not too important, as they are never |
| 80 | +referenced directly by the code (unlike sequence names). Most of |
| 81 | +the indexes in the file as of this writing are there due to production |
| 82 | +testing of expensive queries on a busy wiki. |
| 83 | + |
| 84 | + |
| 85 | +== Keeping in sync with tables.sql == |
| 86 | + |
| 87 | +The script maintenance/postgres/compare_schemas.pl should be |
| 88 | +periodically run. It will parse both "tables.sql" files and |
| 89 | +produce any differences found. Such differences should be fixed |
| 90 | +or exceptions specifically carved out by editing the script |
| 91 | +itself. This script has also been very useful in finding problems |
| 92 | +in maintenance/tables.sql itself, as it is very strict in the |
| 93 | +format it expects things to be in. :) |
| 94 | + |
| 95 | + |
| 96 | +== Getting help == |
| 97 | + |
| 98 | +In addition to the normal venues (MediaWiki mailing lists |
| 99 | +and IRC channels), the #postgresql channel on irc.freenode.net |
| 100 | +is a friendly and expert resource if you should encounter a |
| 101 | +problem with your Postgres-enabled MediaWiki. |
| 102 | + |