r95685 MediaWiki - Code Review archive

Repository:	MediaWiki
Revision:	< r95684‎ \| r95685 \| r95686 >
Date:	18:57, 29 August 2011
Author:	giovanni
Status:	deferred
Tags:
Comment:	added documentation to editor_lifecycle
Modified paths:	/trunk/tools/wsor/editor_lifecycle/README.rst (modified) (history) /trunk/tools/wsor/editor_lifecycle/TODO.rst (added) (history)

Diff [purge]

Index: trunk/tools/wsor/editor_lifecycle/TODO.rst
—	—	@@ -0,0 +1,2 @@
	2	+* Use `oursql.Cursor.executemany` in `fetchrates`. Presently this is not possible,
	3	+ because of a bug in `oursql`. See https://answers.launchpad.net/oursql/+question/166877
Index: trunk/tools/wsor/editor_lifecycle/README.rst
—	—	@@ -1,7 +1,11 @@
2		~~-============~~
3		~~-README~~
4		~~-============~~
	2	+Editor lifecycle
	3	+================
5	4
	5	+Author: Giovanni Luca Ciampaglia
	6	+
	7	+License
	8	+-------
	9	+
6	10	Copyright (C) 2011 GIOVANNI LUCA CIAMPAGLIA, GCIAMPAGLIA@WIKIMEDIA.ORG
7	11	This program is free software; you can redistribute it and/or modify
8	12	it under the terms of the GNU General Public License as published by
—	—	@@ -18,33 +22,54 @@
19	23	51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
20	24	http://www.gnu.org/copyleft/gpl.html
21	25
22		~~-workflow~~
	26	+Installation
	27	+------------
23	28
24		~~-This package is a collection of python and shell scripts that can assist~~
25		~~-creating and analyzing data on user life cycle.~~
	29	+To install this package you can use the normal distutils command::
26	30
27		~~-Sample selection~~
	31	+ python setup.py install
28	32
29		~~-TBD~~
	33	+see http://docs.python.org/install/index.html#install-index for more options.
	34	+You might require root access (sudo) to perform a system-wide installation.
30	35
31		~~-Edit activity data collection~~
	36	+Usage
	37	+-----
	38	+See http://http://meta.wikimedia.org/wiki/Research:Editor_lifecycle. All scripts
	39	+accept arguments from the command line and understand the common -h/--help
	40	+option.
32	41
33		~~-First use `fetchrates` to download the rate data from the MySQL database. This~~
34		~~-script takes a user_id in input (and stores the rate data in a file called~~
35		~~-<user_id>.npy). This script can be parallelized. At the end you will end up with~~
36		~~-a bunch of NPY files.~~
	42	+Workflow
	43	+--------
37	44
38		~~-Cohort selection~~
	45	+1. Fetch user rates using `ratesnobots.sql`::
39	46
40		~~-See the docstring in `mkcohort`.~~
	47	+ mysql -BNe < ratesnobots.sql > rates.tsv
41	48
42		~~-Cohort analysis~~
	49	+Note: To be able to run this query, you must be able to access an internal
	50	+resource of the Wikimedia Foundation, see here for more information:
	51	+http://collab.wikimedia.org/wiki/WSoR_datasets/bot. If you can't access this
	52	+page, you can recreate this information from a public dump of the
	53	+`user_groups` and `user` tables in the following way:
43	54
44		~~-See `graphlife`, `fitting`, `fitting_batch.sh`, and `relax`.~~
	55	+ a. Gather usernames from bot status
	56	+ (http://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Status) and list of
	57	+ bots by number of edits
	58	+ (http://en.wikipedia.org/wiki/Wikipedia:List_of_bots_by_number_of_edits)
45	59
	60	+ b. Select the user IDs of the gathered user names from `user`
	61	+
	62	+ c. Do a union the above data with user_groups::
	63	+
	64	+ SELECT DISTINCT ug_user FROM user_groups where ug_group = "bot"
	65	+
	66	+2. Use `mkcohort` to make cohorts. This will create a file where each line is a
	67	+ cohort, specified by the first two columns. Columns after the second are the
	68	+ IDs of users.
	69	+
	70	+3. Use `fetchrates` to fetch daily edit counts using the cohort data. See
	71	+ `sge/rates.sh` if you want to run this query from within the toolserver.
	72	+
	73	+4. At this point you can use the other utilities to analyze the rate data. To
	74	+ compute and plot activity peaks, use `comppeak` and `plotpeak`.
	75	+
	76	+5. Happy hacking/researching!

Status & tagging log

22:00, 29 August 2011 Reedy (talk | contribs) changed the status of r95685 [removed: new added: deferred]