r104758 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r104757‎ | r104758 | r104759 >
Date:21:49, 30 November 2011
Author:catrope
Status:resolved (Comments)
Tags:
Comment:
Script for cleaning up the mess left behind by bug 31576. Still haven't figured out what's causing this bug, but I've got it narrowed down to the job runners
Modified paths:
  • /trunk/extensions/WikimediaMaintenance/cleanupBug31576.php (added) (history)

Diff [purge]

Index: trunk/extensions/WikimediaMaintenance/cleanupBug31576.php
@@ -0,0 +1,64 @@
 2+<?php
 3+$IP = getenv( 'MW_INSTALL_PATH' );
 4+if ( $IP === false ) {
 5+ $IP = dirname( __FILE__ ) . '/../..';
 6+}
 7+require( "$IP/maintenance/Maintenance.php" );
 8+
 9+class CleanupBug31576 extends Maintenance {
 10+ public function __construct() {
 11+ parent::__construct();
 12+ $this->mDescription = "Cleans up templatelinks corruption caused by https://bugzilla.wikimedia.org/show_bug.cgi?id=31576";
 13+ $this->addOption( 'batchsize', 'Number of rows to process in one batch. Default: 50', false, true );
 14+ }
 15+
 16+ public function execute() {
 17+ $this->batchsize = $this->getOption( 'batchsize', 50 );
 18+ $variableIDs = MagicWord::getVariableIDs();
 19+ foreach ( $variableIDs as $id ) {
 20+ $magic = MagicWord::get( $id );
 21+ foreach ( $magic->getSynonyms() as $synonym ) {
 22+ $this->processSynonym( $synonym );
 23+ }
 24+ }
 25+ $this->output( "All done\n" );
 26+ }
 27+
 28+ public function processSynonym( $synonym ) {
 29+ $dbr = wfGetDB( DB_SLAVE );
 30+ $pCount = 0;
 31+ $vCount = 0;
 32+ $this->output( "Fixing pages with template links to $synonym ...\n" );
 33+ while ( true ) {
 34+ $res = $dbr->select( 'templatelinks', array( 'tl_title', 'tl_from' ),
 35+ array(
 36+ 'tl_namespace' => NS_TEMPLATE,
 37+ 'tl_title ' . $dbr->buildLike( $synonym, $dbr->anyString() )
 38+ ), __METHOD__,
 39+ array( 'ORDER BY' => array( 'tl_title', 'tl_from' ), 'LIMIT' => $this->batchsize )
 40+ );
 41+ if ( $dbr->numRows( $res ) == 0 ) {
 42+ // No more rows, we're done
 43+ break;
 44+ }
 45+
 46+ $processed = array();
 47+ foreach ( $res as $row ) {
 48+ $vCount++;
 49+ if ( isset( $processed[$row->tl_from] ) ) {
 50+ // We've already processed this page, skip it
 51+ continue;
 52+ }
 53+ RefreshLinks::fixLinksFromArticle( $row->tl_from );
 54+ $processed[$row->tl_from] = true;
 55+ $pCount++;
 56+ }
 57+ $this->output( "{$pCount}/{$vCount} pages processed\n" );
 58+ wfWaitForSlaves();
 59+ }
 60+ }
 61+
 62+}
 63+
 64+$maintClass = "CleanupBug31576";
 65+require_once( RUN_MAINTENANCE_IF_MAIN );
\ No newline at end of file
Property changes on: trunk/extensions/WikimediaMaintenance/cleanupBug31576.php
___________________________________________________________________
Added: svn:eol-style
166 + native

Follow-up revisions

RevisionCommit summaryAuthorDate
r105964Temporary workaround for bug 31576. The logs show that once every hour or so,...tstarling01:14, 13 December 2011
r113928Followup r104758...reedy16:09, 15 March 2012

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r1040591.18wmf1: Live hacks for investigating bug 31576: add server name to pcache c...catrope18:04, 23 November 2011
r1047011.18wmf1: Another live debugging hack for bug 31576catrope16:57, 30 November 2011
r1047221.18wmf1: Another logging hack for bug 31576catrope19:26, 30 November 2011
r104732Add RefreshLinks class to the AutoLoader, I'll need it for my cleanup script ...catrope20:03, 30 November 2011

Comments

#Comment by Catrope (talk | contribs)   22:59, 30 November 2011

Note for merging: this depends on r104732

#Comment by 😂 (talk | contribs)   14:45, 1 December 2011

If you use setBatchSize( 50 ) in the constructor, you can skip setting and getting the option manually and just use $mBatchSize.

#Comment by Catrope (talk | contribs)   10:51, 12 December 2011

I don't quite understand. Are you saying the Maintenance class has built-in support for a batchsize parameter?

#Comment by 😂 (talk | contribs)   12:19, 12 December 2011

Yes

#Comment by Reedy (talk | contribs)   23:03, 13 March 2012

I'm not sure if this maintenance script is doing what it's supposed to be.. So I've just cancelled them all for the moment.

e.g. abwiki - Less than 3000 pages, just over 6000 templatelinks... Why is the script reporting it's processed over 800,000 pages and counting? Shouldn't it need to (at worst!) refresh links on every page.

Should the setting of $processed back to an empty array be moved to the top of the processSynonym method? Or even move it to a class variable, so we only attempt to parse any page ones per wiki?

mysql> select count(*) from page\G
*************************** 1. row ***************************
count(*): 2871
1 row in set (0.00 sec)

mysql> select count(*) from templatelinks\G
*************************** 1. row ***************************
count(*): 6046
1 row in set (0.00 sec)

Marking fixme so it gets some attention

#Comment by Catrope (talk | contribs)   00:03, 15 March 2012

Yeah, moving $processed outside of the loop sounds good. I don't know what else would be wrong here, IIRC it Just Worked last time around. Or maybe I only tested it on my local wiki where $dbw==$dbr and that was the problem.

Status & tagging log