r81365 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r81364‎ | r81365 | r81366 >
Date:01:15, 2 February 2011
Author:ariel
Status:deferred
Tags:
Comment:
initial documentation for config files
Modified paths:
  • /branches/ariel/xmldumps-backup/README.config (added) (history)

Diff [purge]

Index: branches/ariel/xmldumps-backup/README.config
@@ -0,0 +1,203 @@
 2+Configuration file documentation
 3+
 4+===How to run using a given configration file
 5+
 6+The configuration file for monitor, worker, monitor.py and worker.py is
 7+called wikidump.conf by default. To specify a different configuration
 8+file, supply it as an argument on the command line as follows:
 9+
 10+For monitor:
 11+
 12+monitor name-of-config-file-here
 13+
 14+For worker:
 15+
 16+worker name-of-config-file-here
 17+
 18+For monitor.py:
 19+
 20+python monitor.py name-of-config-file-here
 21+
 22+For worker.py:
 23+
 24+python worker.py [other-options] --configfile name-of-config-file-here wikidbname-here
 25+
 26+===Structure of a configuration file
 27+
 28+Each section of the configuration file starts with a name in brackets, with
 29+no leading spaces. For example:
 30+
 31+[wiki]
 32+
 33+This would introduce the options related to the wikis that are processed.
 34+
 35+The following sections are recognized and must be present, even if no
 36+configuration options are provided for the section:
 37+
 38+wiki, output, reporting, database, tools, cleanup, chunks
 39+
 40+FIXME
 41+Of these, the sections wiki, .. and chunks are mandatory and must have entries.
 42+
 43+===Wiki section
 44+
 45+The wiki section accepts the following configuration options:
 46+
 47+dblist -- File with list of all databases for which dumps will be generated
 48+ Default value: none
 49+skipdblist -- ... except for the ones in this file. (This is a bit odd;
 50+ why not just list the ones you want and be done with it?
 51+ Because the WMF list is generated automatically and used
 52+ for other things, so it is not feasible to remove dbs
 53+ from it by hand and still keep it in sync as new projects
 54+ are created.)
 55+ Default value: none
 56+privatelist -- File with list of databases which should have dumps produced
 57+ that are put in the "private" dirctory. At WMF this means
 58+ wikis that are not publically readable by the world.
 59+ Default value: none
 60+flaggedrevslist -- File with list of databases which have flagged revisions
 61+ enabled. (Really, we should be able to determine this
 62+ another way instead of keeping a separate list, right?)
 63+biglist -- File with list of large wikis for which no history dumps are
 64+ generated because they are too huge. (This must be an old
 65+ deprecated option; these days we do not care how big they
 66+ are, we dump them anyways.)
 67+ Default value: none
 68+dir -- Full path to the root directory of the MediaWiki installation for which
 69+ dumps are produced. This assumes one installation for
 70+ multiple wikis, nd therefore one LocalSettings.php or
 71+ equivalent that covers all the projects. At WMF this is done
 72+ by having the files InitialiseSetttings.php and
 73+ CommonSettings.php which have various if stanzas depending
 74+ on what it enabled on specific projects.
 75+ Default value: none
 76+forcenormal -- Geez, I have no idea what this does. Maybe we can toss it.
 77+ Default value: 0
 78+halt -- what does this do?
 79+ Default value: 0
 80+
 81+Of those options, the following are required:
 82+...
 83+
 84+
 85+=== Output section
 86+public -- full path to directory under which all dumps will be created,
 87+ in subdirectories named for the name of the database
 88+ (wikiproject) being dumped, in subdirectories by date
 89+ Default value: /dumps/public
 90+private -- full path to directory under which all dumps of private wikis
 91+ and all private tables will be created, in subdirs by project
 92+ name and underneath that in subdirs by date, similar to the
 93+ public dumps
 94+ Default value: /dumps/private
 95+index -- name of the top-level index file for all projects that is
 96+ automatically created by the monitoring process
 97+ Default value: index.html
 98+webroot -- url to root of the web directory which serves the public files (this
 99+ is simply the web url that gets people to the content in the "public"
 100+ directory defined earlier)
 101+ Default value: http://localhost/dumps
 102+templatedir -- directory in which various template files such as those for mail or
 103+ error reports, rss feed updates or the per-project-and-date html files
 104+ are found
 105+ Default value: home
 106+perdumpindex -- name of the index file created for a dump for a given project
 107+ on a given date
 108+ Default value: index.html
 109+
 110+The above options do not have to be specified in the config file,
 111+since default values are provided.
 112+
 113+=== Reporting section
 114+adminmail -- email address to which to send error reports
 115+ Default value: root@localhost
 116+mailfrom -- email address from which we pretend to send error reports
 117+ (shows up in the From: line)
 118+ Default value: root@localhost
 119+smtpserver --FQDN of smtp server for sending error reports via email
 120+ Default value: localhost
 121+staleage --how many seconds a lock file from a dump run can be lying
 122+ around without updating of the status file for that run,
 123+ until the lock file is considered "stale", i.e. that there
 124+ is probably no process actually running for that dump
 125+ any more
 126+ Default value: 3600
 127+
 128+The above options do not have to be specified in the config file,
 129+since default values are provided.
 130+
 131+=== Database section
 132+user -- user which which to connect to the db for mysqldump of tables
 133+ Default value: root
 134+password -- password for the above user
 135+ Default value: ""
 136+
 137+The above options do not have to be specified in the config file,
 138+since default values are provided.
 139+
 140+=== Tools section
 141+php -- Location of the php binary
 142+ Default value: /bin/php
 143+bzip2 -- Location of the bzip2 binary
 144+ Default value: /usr/bin/bzip2
 145+gzip2 -- this should get changed to gzip :-D Location
 146+ of the gzip binary
 147+ Default value: /usr/bin/gzip
 148+sevenzip -- Location of the 7zip binary
 149+ Default value: /bin/7za
 150+mysql -- Location of the mysql binary
 151+ Default value: /usr/bin/mysql
 152+mysqldump -- Location of the mysqldump binary
 153+ Default value: /usr/bin/mysqldump
 154+head -- Location of the head binary
 155+ Default value: /usr/bin/head
 156+tail -- Location of the tail binary
 157+ Default value: /usr/bin/tail
 158+cat -- Location of the cat binary
 159+ Default value: /bin/cat
 160+grep -- Location of the grep binary
 161+ Default value:/bin/grep
 162+
 163+The above options do not have to be specified in the config file,
 164+since default values are provided.
 165+
 166+=== Cleanup section
 167+keep -- number of dumps per wiki project to keep before we start
 168+ removing the oldest one each time a new one is created
 169+ Default value: 3
 170+
 171+The above option does not have to be specified in the config file,
 172+since a default is provided.
 173+
 174+=== Chunks section
 175+chunksEnabled -- buggy. set to any value to enable. Why? Because
 176+ any string value counts as "true", even the value...
 177+ "False" :-D
 178+ Default value: False
 179+pagesPerChunkHistory
 180+ Set to a comma separated ist of starting page ID nums
 181+ in order to generate a set of stub files each one
 182+ starting from the next pageID.
 183+ Example:
 184+ pagesPerChunkHistory=5000,5000,100000,100000
 185+ This would generate four chunks, containing:
 186+ 1 to 5000, 5001 through 10000, 10001 through 110000,
 187+ 110001 through end
 188+ Alternatively you can provide one number in which case
 189+ the job will be split into chunks each containing that
 190+ number of pages. Example:
 191+ pagesPerChunkHistory=50000
 192+ This will generate a number of chunks with pages from
 193+ 1 through 50000, 50001 through 100000, 100001 through
 194+ 150000, and so on.
 195+ Default value: False
 196+revsPerChunkHistory -- currently disabled, do not use!
 197+ Default value: False
 198+pagesPerChunkAbstract -- as pagesPerChunkHistory but for the abstract
 199+ generation phase
 200+ Default value: False
 201+
 202+The above options do not have to be specified in the config file,
 203+since default values are provided.
 204+

Status & tagging log