Index: branches/ariel/xmldumps-backup/README.config |
— | — | @@ -0,0 +1,203 @@ |
| 2 | +Configuration file documentation |
| 3 | + |
| 4 | +===How to run using a given configration file |
| 5 | + |
| 6 | +The configuration file for monitor, worker, monitor.py and worker.py is |
| 7 | +called wikidump.conf by default. To specify a different configuration |
| 8 | +file, supply it as an argument on the command line as follows: |
| 9 | + |
| 10 | +For monitor: |
| 11 | + |
| 12 | +monitor name-of-config-file-here |
| 13 | + |
| 14 | +For worker: |
| 15 | + |
| 16 | +worker name-of-config-file-here |
| 17 | + |
| 18 | +For monitor.py: |
| 19 | + |
| 20 | +python monitor.py name-of-config-file-here |
| 21 | + |
| 22 | +For worker.py: |
| 23 | + |
| 24 | +python worker.py [other-options] --configfile name-of-config-file-here wikidbname-here |
| 25 | + |
| 26 | +===Structure of a configuration file |
| 27 | + |
| 28 | +Each section of the configuration file starts with a name in brackets, with |
| 29 | +no leading spaces. For example: |
| 30 | + |
| 31 | +[wiki] |
| 32 | + |
| 33 | +This would introduce the options related to the wikis that are processed. |
| 34 | + |
| 35 | +The following sections are recognized and must be present, even if no |
| 36 | +configuration options are provided for the section: |
| 37 | + |
| 38 | +wiki, output, reporting, database, tools, cleanup, chunks |
| 39 | + |
| 40 | +FIXME |
| 41 | +Of these, the sections wiki, .. and chunks are mandatory and must have entries. |
| 42 | + |
| 43 | +===Wiki section |
| 44 | + |
| 45 | +The wiki section accepts the following configuration options: |
| 46 | + |
| 47 | +dblist -- File with list of all databases for which dumps will be generated |
| 48 | + Default value: none |
| 49 | +skipdblist -- ... except for the ones in this file. (This is a bit odd; |
| 50 | + why not just list the ones you want and be done with it? |
| 51 | + Because the WMF list is generated automatically and used |
| 52 | + for other things, so it is not feasible to remove dbs |
| 53 | + from it by hand and still keep it in sync as new projects |
| 54 | + are created.) |
| 55 | + Default value: none |
| 56 | +privatelist -- File with list of databases which should have dumps produced |
| 57 | + that are put in the "private" dirctory. At WMF this means |
| 58 | + wikis that are not publically readable by the world. |
| 59 | + Default value: none |
| 60 | +flaggedrevslist -- File with list of databases which have flagged revisions |
| 61 | + enabled. (Really, we should be able to determine this |
| 62 | + another way instead of keeping a separate list, right?) |
| 63 | +biglist -- File with list of large wikis for which no history dumps are |
| 64 | + generated because they are too huge. (This must be an old |
| 65 | + deprecated option; these days we do not care how big they |
| 66 | + are, we dump them anyways.) |
| 67 | + Default value: none |
| 68 | +dir -- Full path to the root directory of the MediaWiki installation for which |
| 69 | + dumps are produced. This assumes one installation for |
| 70 | + multiple wikis, nd therefore one LocalSettings.php or |
| 71 | + equivalent that covers all the projects. At WMF this is done |
| 72 | + by having the files InitialiseSetttings.php and |
| 73 | + CommonSettings.php which have various if stanzas depending |
| 74 | + on what it enabled on specific projects. |
| 75 | + Default value: none |
| 76 | +forcenormal -- Geez, I have no idea what this does. Maybe we can toss it. |
| 77 | + Default value: 0 |
| 78 | +halt -- what does this do? |
| 79 | + Default value: 0 |
| 80 | + |
| 81 | +Of those options, the following are required: |
| 82 | +... |
| 83 | + |
| 84 | + |
| 85 | +=== Output section |
| 86 | +public -- full path to directory under which all dumps will be created, |
| 87 | + in subdirectories named for the name of the database |
| 88 | + (wikiproject) being dumped, in subdirectories by date |
| 89 | + Default value: /dumps/public |
| 90 | +private -- full path to directory under which all dumps of private wikis |
| 91 | + and all private tables will be created, in subdirs by project |
| 92 | + name and underneath that in subdirs by date, similar to the |
| 93 | + public dumps |
| 94 | + Default value: /dumps/private |
| 95 | +index -- name of the top-level index file for all projects that is |
| 96 | + automatically created by the monitoring process |
| 97 | + Default value: index.html |
| 98 | +webroot -- url to root of the web directory which serves the public files (this |
| 99 | + is simply the web url that gets people to the content in the "public" |
| 100 | + directory defined earlier) |
| 101 | + Default value: http://localhost/dumps |
| 102 | +templatedir -- directory in which various template files such as those for mail or |
| 103 | + error reports, rss feed updates or the per-project-and-date html files |
| 104 | + are found |
| 105 | + Default value: home |
| 106 | +perdumpindex -- name of the index file created for a dump for a given project |
| 107 | + on a given date |
| 108 | + Default value: index.html |
| 109 | + |
| 110 | +The above options do not have to be specified in the config file, |
| 111 | +since default values are provided. |
| 112 | + |
| 113 | +=== Reporting section |
| 114 | +adminmail -- email address to which to send error reports |
| 115 | + Default value: root@localhost |
| 116 | +mailfrom -- email address from which we pretend to send error reports |
| 117 | + (shows up in the From: line) |
| 118 | + Default value: root@localhost |
| 119 | +smtpserver --FQDN of smtp server for sending error reports via email |
| 120 | + Default value: localhost |
| 121 | +staleage --how many seconds a lock file from a dump run can be lying |
| 122 | + around without updating of the status file for that run, |
| 123 | + until the lock file is considered "stale", i.e. that there |
| 124 | + is probably no process actually running for that dump |
| 125 | + any more |
| 126 | + Default value: 3600 |
| 127 | + |
| 128 | +The above options do not have to be specified in the config file, |
| 129 | +since default values are provided. |
| 130 | + |
| 131 | +=== Database section |
| 132 | +user -- user which which to connect to the db for mysqldump of tables |
| 133 | + Default value: root |
| 134 | +password -- password for the above user |
| 135 | + Default value: "" |
| 136 | + |
| 137 | +The above options do not have to be specified in the config file, |
| 138 | +since default values are provided. |
| 139 | + |
| 140 | +=== Tools section |
| 141 | +php -- Location of the php binary |
| 142 | + Default value: /bin/php |
| 143 | +bzip2 -- Location of the bzip2 binary |
| 144 | + Default value: /usr/bin/bzip2 |
| 145 | +gzip2 -- this should get changed to gzip :-D Location |
| 146 | + of the gzip binary |
| 147 | + Default value: /usr/bin/gzip |
| 148 | +sevenzip -- Location of the 7zip binary |
| 149 | + Default value: /bin/7za |
| 150 | +mysql -- Location of the mysql binary |
| 151 | + Default value: /usr/bin/mysql |
| 152 | +mysqldump -- Location of the mysqldump binary |
| 153 | + Default value: /usr/bin/mysqldump |
| 154 | +head -- Location of the head binary |
| 155 | + Default value: /usr/bin/head |
| 156 | +tail -- Location of the tail binary |
| 157 | + Default value: /usr/bin/tail |
| 158 | +cat -- Location of the cat binary |
| 159 | + Default value: /bin/cat |
| 160 | +grep -- Location of the grep binary |
| 161 | + Default value:/bin/grep |
| 162 | + |
| 163 | +The above options do not have to be specified in the config file, |
| 164 | +since default values are provided. |
| 165 | + |
| 166 | +=== Cleanup section |
| 167 | +keep -- number of dumps per wiki project to keep before we start |
| 168 | + removing the oldest one each time a new one is created |
| 169 | + Default value: 3 |
| 170 | + |
| 171 | +The above option does not have to be specified in the config file, |
| 172 | +since a default is provided. |
| 173 | + |
| 174 | +=== Chunks section |
| 175 | +chunksEnabled -- buggy. set to any value to enable. Why? Because |
| 176 | + any string value counts as "true", even the value... |
| 177 | + "False" :-D |
| 178 | + Default value: False |
| 179 | +pagesPerChunkHistory |
| 180 | + Set to a comma separated ist of starting page ID nums |
| 181 | + in order to generate a set of stub files each one |
| 182 | + starting from the next pageID. |
| 183 | + Example: |
| 184 | + pagesPerChunkHistory=5000,5000,100000,100000 |
| 185 | + This would generate four chunks, containing: |
| 186 | + 1 to 5000, 5001 through 10000, 10001 through 110000, |
| 187 | + 110001 through end |
| 188 | + Alternatively you can provide one number in which case |
| 189 | + the job will be split into chunks each containing that |
| 190 | + number of pages. Example: |
| 191 | + pagesPerChunkHistory=50000 |
| 192 | + This will generate a number of chunks with pages from |
| 193 | + 1 through 50000, 50001 through 100000, 100001 through |
| 194 | + 150000, and so on. |
| 195 | + Default value: False |
| 196 | +revsPerChunkHistory -- currently disabled, do not use! |
| 197 | + Default value: False |
| 198 | +pagesPerChunkAbstract -- as pagesPerChunkHistory but for the abstract |
| 199 | + generation phase |
| 200 | + Default value: False |
| 201 | + |
| 202 | +The above options do not have to be specified in the config file, |
| 203 | +since default values are provided. |
| 204 | + |