r66140 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r66139‎ | r66140 | r66141 >
Date:15:47, 10 May 2010
Author:maxsem
Status:reverted (Comments)
Tags:
Comment:
(bug 18488) Added maintenance script refreshCategoryCounts.php. Based on Happy-Mellon's patch from that bug, heavily refactored by me
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/maintenance/refreshCategoryCounts.php (added) (history)

Diff [purge]

Index: trunk/phase3/maintenance/refreshCategoryCounts.php
@@ -0,0 +1,102 @@
 2+<?php
 3+/**
 4+ * This script will refresh the cat_pages, cat_subcats and cat_files fields of
 5+ * the category table, which may be incorrect if the wiki ran the corrupted
 6+ * version of Article::doDeleteArticle (r40912 --> r47326); see explanation at
 7+ * [https://bugzilla.wikimedia.org/show_bug.cgi?id=17155]. It will print out
 8+ * progress indicators every 1000 categories it updates. You may want to use the
 9+ * throttling options if it's causing too much load; they will not affect
 10+ * correctness.
 11+ *
 12+ * If the script is stopped and later resumed, you can use the --start option
 13+ * with the last printed progress indicator to pick up where you left off.
 14+ * This is safe, because any newly-added categories will be added at the end of
 15+ * the table.
 16+ *
 17+ * @file
 18+ * @ingroup Maintenance
 19+ * @author Happy-melon, Max Semenik
 20+ * Based on /maintenance/populateCategory.php by Simetrical.
 21+ */
 22+
 23+require_once( dirname( __FILE__ ) . '/Maintenance.php' );
 24+
 25+class RefreshCategoryCounts extends Maintenance {
 26+ const REPORTING_INTERVAL = 1000;
 27+
 28+ public function __construct() {
 29+ $this->mDescription = 'Refreshes category counts';
 30+ $this->addOption( 'start', 'Start from this category ID', false, true );
 31+ $this->addOption( 'maxlag', 'Maximum database slave lag in seconds (5 by default)', false, true );
 32+ $this->addOption( 'throttle', 'Optional delay after every processed category in milliseconds',
 33+ false, true );
 34+ }
 35+
 36+ public function execute() {
 37+ $start = intval( $this->getOption( 'start', 0 ) );
 38+ $maxlag = intval( $this->getOption( 'maxlag', 5 ) );
 39+ $throttle = intval( $this->getOption( 'throttle', 0 ) );
 40+
 41+ $this->doRefresh( $start, $maxlag, $throttle );
 42+ }
 43+
 44+ protected function doRefresh( $start, $maxlag, $throttle ) {
 45+ $dbw = wfGetDB( DB_MASTER );
 46+
 47+ $maxlag = intval( $maxlag );
 48+ $throttle = intval( $throttle );
 49+ $id = $start;
 50+
 51+ $i = 0;
 52+ while ( true ) {
 53+ # Find which category to update
 54+ $row = $dbw->selectRow(
 55+ 'category',
 56+ array( 'cat_id', 'cat_title' ),
 57+ 'cat_id > ' . $dbw->addQuotes( $id ),
 58+ __METHOD__,
 59+ array( 'ORDER BY' => 'cat_id' )
 60+ );
 61+ if ( !$row ) {
 62+ # Done, hopefully.
 63+ break;
 64+ }
 65+ $id = $row->cat_id;
 66+ $name = $row->cat_title;
 67+
 68+ # Use the row to update the category count
 69+ $cat = Category::newFromName( $name );
 70+ if ( !is_object( $cat ) ) {
 71+ $this->output( "Invalid category name '$name'\n" );
 72+ } else {
 73+ $cat->refreshCounts();
 74+ }
 75+
 76+ $i++;
 77+ if ( !( $i % self::REPORTING_INTERVAL ) ) {
 78+ $this->output( "$id\n" );
 79+ wfWaitForSlaves( $maxlag );
 80+ }
 81+ usleep( $throttle * 1000 );
 82+ }
 83+
 84+ /*if ( $dbw->insert(
 85+ 'updatelog',
 86+ array( 'ul_key' => 'refresh catgory counts' ),
 87+ __METHOD__,
 88+ 'IGNORE'
 89+ )
 90+ ) {
 91+ $this->output( "Category count refresh complete.\n" );
 92+ return true;
 93+ } else {
 94+ $this->output( "Could not insert category population row.\n" );
 95+ return false;
 96+ }*/
 97+ $this->output( "Category count refresh complete.\n" );
 98+ }
 99+}
 100+
 101+$maintClass = "RefreshCategoryCounts";
 102+require_once( DO_MAINTENANCE );
 103+
Property changes on: trunk/phase3/maintenance/refreshCategoryCounts.php
___________________________________________________________________
Name: svn:eol-style
1104 + native
Index: trunk/phase3/RELEASE-NOTES
@@ -71,6 +71,7 @@
7272 * (bug 20976) "searchmenu-new-nocreate" message now displayed when when there
7373 is no title match in search and the user has no rights to create pages.
7474 * (bug 23429) Added new hook WatchlistEditorBuildRemoveLine
 75+* (bug 18488) Added maintenance script refreshCategoryCounts.php
7576
7677 === Bug fixes in 1.17 ===
7778 * (bug 17560) Half-broken deletion moved image files to deletion archive

Follow-up revisions

RevisionCommit summaryAuthorDate
r661551.16wmf4: MFT r66140catrope19:59, 10 May 2010
r75611Revert r66140 per CRmaxsem13:40, 28 October 2010

Comments

#Comment by Simetrical (talk | contribs)   21:14, 7 June 2010

Um, isn't this a duplicate of populateCategory.php? We probably don't want to keep both of them. Bug 18488 says "I *think* that /maintenance/populateCategory.php would work, but that's not really what it was designed for, and it's certainly not the most efficient way of doing it" -- but the implementation seems virtually identical. In fact, large chunks of the new script were just copy-pasted from populateCategory.php. The only real difference I notice is reading id's from the category table instead of names from categorylinks, but I don't see how that matters.

So . . . what was the idea here again?

#Comment by 😂 (talk | contribs)   14:35, 18 October 2010

MaxSem: Ping ;-)

#Comment by MaxSem (talk | contribs)   17:36, 20 October 2010

Hm, at least this script was run on WMF, not populateCategory :) It should apparently run faster due to category table being smaller than categorylinks (~1M vs 33M+ on enwiki) and smaller field size (int cat_id vs varchar(255) cl_to), though I wonder how much gain it is.

#Comment by Simetrical (talk | contribs)   20:38, 21 October 2010

I'd be surprised if it makes any big difference. You're scanning the whole categorylinks table anyway, after all, to do the counting. Why do you think it matters?

But if it does make a difference, you should add a switch to populateCategory.php. This bulk of the code is literally a line-by-line copy-paste duplicate of populateCategory.php with a few lines changed, and that kind of gratuitous code duplication is just not acceptable. Code needs to be reused, not copied.

Status & tagging log