r95455 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r95454‎ | r95455 | r95456 >
Date:22:52, 24 August 2011
Author:ariel
Status:ok
Tags:
Comment:
bring main README up to date, minor cleanups
Modified paths:
  • /branches/ariel/xmldumps-backup/README (modified) (history)

Diff [purge]

Index: branches/ariel/xmldumps-backup/README
@@ -6,40 +6,45 @@
77
88 === Worker ===
99
10 -Each dump machine runs a worker process which continuously generates dumps.
 10+Each dump machine runs a worker process, a shell script which continuously
 11+calls a python script to generate a dump for the next available wiki.
1112 At each iteration, the set of wikis is ordered by last dump date, and the
1213 least-recently-touched wiki is selected.
1314
14 -Workers are kept from stomping on each other by creating a lock file in
15 -the private dump directory. To aid in administration, the lock file contains
 15+There are two directory trees used by the dumps processes, one for public
 16+tables and files of public wikis, and one for private wikis or for private
 17+tables and files (such as the user table) of public wikis.
 18+
 19+Workers (the python scripts) are kept from stomping on each other by creating
 20+a lock file in the private dump directory for the specific wiki. The lock file contains
1621 the hostname and process ID of the worker process holding the lock.
1722
1823 Lock files are touched every 10 seconds while the process runs, and removed
1924 at the end.
2025
21 -On each iteration, the script and configuration are reloaded, so additions
22 -to the database list or dump code will be made available without manually
23 -restarting things.
 26+On each iteration, a new copy of the python script is run, which reads its
 27+configuration files from scratch, so additions to the database list files or
 28+changes to the dupm script introduced during the middle of one dump will
 29+go into effect at the start of the next dump.
2430
25 -
2631 === Monitor ===
2732
28 -One master machine runs the monitor process, which periodically sweeps all
29 -wikis for their current status. This accomplishes two tasks:
 33+One server runs the monitor process, which periodically sweeps all
 34+public dump directories (one per wiki) for their current status. This accomplishes two tasks:
3035
3136 * The index page is updated with a summary of dump states
32 -* Aborted dumps are detected and cleaned up
 37+* Aborted dumps are detected and cleaned up (how complete is this?)
3338
3439 A lock file that has not been touched in some time is detected as stale,
3540 indicating that the worker process holding the lock has died. The status
3641 for that dump can then be updated from running to stopped, and the lock
37 -file is removed so that the wiki will get redumped later.
 42+file is removed so that the wiki will get dumped again later.
3843
 44+== Code ==
3945
40 -== Code files ==
41 -
4246 worker.py
43 -- Runs a dump for the least-recently dumped wiki in the stack.
 47+- Runs a dump for the least-recently dumped wiki in the stack, or the desired wiki
 48+ can be specified from the command line
4449
4550 monitor.py
4651 - Generates the site-wide index summary and removes stale locks.
@@ -47,7 +52,16 @@
4853 WikiDump.py
4954 - Shared classes and functions
5055
 56+CommandManagement.py
 57+- Classes for running multiple commands at the same time, used for running some phases
 58+ of the dumps in multiple pieces at the same time, for speed
5159
 60+mwbzutils/
 61+- Library of utilities for working with bzip2 files, used for locating
 62+ an arbitrary XML page in a dump file, checking that the file was written
 63+ out completely without truncation, and other tools. See the README in
 64+ the directory for more details.
 65+
5266 == Configuration ==
5367
5468 Configuration is done with an INI-style configuration file wikidump.conf.

Status & tagging log