r32085 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r32084‎ | r32085 | r32086 >
Date:00:17, 18 March 2008
Author:simetrical
Status:old (Comments)
Tags:
Comment:
This is a schema change. It's only a table creation, but the table must be created on Wikimedia servers before this revision goes live. The maintenance script populateCategory.php should be run when convenient. If it's not run, there's only one substantial case where display will be harmed: the page of a category with more than 200 net pages added since the patch goes live will give an erroneously low count. In other cases category pages will just be better-worded, and it will recognize the count in the table is bogus.
* Adds Category and CategoryList classes to represent categories themselves.
* Adds a category table, giving each category a name, ID, and counts of all members, subcats only, and files.
* Adds a maintenance script to populate the category table efficiently. This script is careful to wait for slaves and should be safe to run on a live database. The maintenance script's includes file is called by update.php.
* Until the category table is populated, the patch handles weird category table rows gracefully. It detects whether they're obviously impossible, and if so, it outputs appropriate messages.
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/Article.php (modified) (history)
  • /trunk/phase3/includes/AutoLoader.php (modified) (history)
  • /trunk/phase3/includes/Category.php (added) (history)
  • /trunk/phase3/includes/CategoryPage.php (modified) (history)
  • /trunk/phase3/includes/LinksUpdate.php (modified) (history)
  • /trunk/phase3/languages/messages/MessagesEn.php (modified) (history)
  • /trunk/phase3/maintenance/archives/patch-category.sql (added) (history)
  • /trunk/phase3/maintenance/populateCategory.inc (added) (history)
  • /trunk/phase3/maintenance/populateCategory.php (added) (history)
  • /trunk/phase3/maintenance/tables.sql (modified) (history)
  • /trunk/phase3/maintenance/updaters.inc (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/archives/patch-category.sql
@@ -0,0 +1,17 @@
 2+CREATE TABLE /*$wgDBprefix*/category (
 3+ cat_id int unsigned NOT NULL auto_increment,
 4+
 5+ cat_title varchar(255) binary NOT NULL,
 6+
 7+ cat_pages int signed NOT NULL default 0,
 8+ cat_subcats int signed NOT NULL default 0,
 9+ cat_files int signed NOT NULL default 0,
 10+
 11+ cat_hidden tinyint(1) unsigned NOT NULL default 0,
 12+
 13+ PRIMARY KEY (cat_id),
 14+ UNIQUE KEY (cat_title),
 15+
 16+ KEY (cat_pages)
 17+) /*$wgDBTableOptions*/;
 18+
Property changes on: trunk/phase3/maintenance/archives/patch-category.sql
___________________________________________________________________
Added: svn:eol-style
119 + native
Index: trunk/phase3/maintenance/populateCategory.inc
@@ -0,0 +1,84 @@
 2+<?php
 3+/**
 4+ * @addtogroup Maintenance
 5+ * @author Simetrical
 6+ */
 7+
 8+define( 'REPORTING_INTERVAL', 1000 );
 9+
 10+function populateCategory( $begin, $maxlag, $throttle, $force ) {
 11+ $dbw = wfGetDB( DB_MASTER );
 12+
 13+ if( !$force ) {
 14+ $row = $dbw->selectRow(
 15+ 'updatelog',
 16+ '1',
 17+ array( 'ul_key' => 'populate category' ),
 18+ __FUNCTION__
 19+ );
 20+ if( $row ) {
 21+ echo "Category table already populated. Use php ".
 22+ "maintenace/populateCategory.php\n--force from the command line ".
 23+ "to override.\n";
 24+ return true;
 25+ }
 26+ }
 27+
 28+ $maxlag = intval( $maxlag );
 29+ $throttle = intval( $throttle );
 30+ $force = (bool)$force;
 31+ if( $begin !== '' ) {
 32+ $where = 'cl_to > '.$dbw->addQuotes( $begin );
 33+ } else {
 34+ $where = null;
 35+ }
 36+ $i = 0;
 37+
 38+ while( true ) {
 39+ # Find which category to update
 40+ $row = $dbw->selectRow(
 41+ 'categorylinks',
 42+ 'cl_to',
 43+ $where,
 44+ __FUNCTION__,
 45+ array(
 46+ 'ORDER BY' => 'cl_to'
 47+ )
 48+ );
 49+ if( !$row ) {
 50+ # Done, hopefully.
 51+ break;
 52+ }
 53+ $name = $row->cl_to;
 54+ $where = 'cl_to > '.$dbw->addQuotes( $name );
 55+
 56+ # Use the row to update the category count
 57+ $cat = Category::newFromName( $name );
 58+ if( !is_object( $cat ) ) {
 59+ var_dump( $cat );
 60+ throw new MWException( "The category named $name is not valid?!" );
 61+ }
 62+ $cat->refreshCounts();
 63+
 64+ ++$i;
 65+ if( !($i % REPORTING_INTERVAL) ) {
 66+ echo "$name\n";
 67+ wfWaitForSlaves( $maxlag );
 68+ }
 69+ usleep( $throttle*1000 );
 70+ }
 71+
 72+ if( $dbw->insert(
 73+ 'updatelog',
 74+ array( 'ul_key' => 'populate category' ),
 75+ __FUNCTION__,
 76+ 'IGNORE'
 77+ )
 78+ ) {
 79+ echo "Category population complete.\n";
 80+ return true;
 81+ } else {
 82+ echo "Could not insert category population row.\n";
 83+ return false;
 84+ }
 85+}
Property changes on: trunk/phase3/maintenance/populateCategory.inc
___________________________________________________________________
Added: svn:eol-style
186 + native
Index: trunk/phase3/maintenance/populateCategory.php
@@ -0,0 +1,51 @@
 2+<?php
 3+/**
 4+ * @addtogroup Maintenance
 5+ * @author Simetrical
 6+ */
 7+
 8+$optionsWithArgs = array( 'begin', 'max-slave-lag', 'throttle' );
 9+
 10+require_once "commandLine.inc";
 11+require_once "populateCategory.inc";
 12+
 13+if( isset( $options['help'] ) ) {
 14+ echo <<<TEXT
 15+This script will populate the category table, added in MediaWiki 1.13. It will
 16+print out progress indicators every 1000 categories it adds to the table. The
 17+script is perfectly safe to run on large, live wikis, and running it multiple
 18+times is harmless. You may want to use the throttling options if it's causing
 19+too much load; they will not affect correctness.
 20+
 21+If the script is stopped and later resumed, you can use the --begin option with
 22+the last printed progress indicator to pick up where you left off. This is
 23+safe, because any newly-added categories before this cutoff will have been
 24+added after the software update and so will be populated anyway.
 25+
 26+When the script has finished, it will make a note of this in the database, and
 27+will not run again without the --force option.
 28+
 29+Usage:
 30+ php populateCategory.php [--max-slave-lag <seconds>] [--begin <name>]
 31+[--throttle <seconds>] [--force]
 32+
 33+ --begin: Only do categories whose names are alphabetically after the pro-
 34+vided name. Default: empty (start from beginning).
 35+ --max-slave-lag: If slave lag exceeds this many seconds, wait until it
 36+drops before continuing. Default: 10.
 37+ --throttle: Wait this many milliseconds after each category. Default: 0.
 38+ --force: Run regardless of whether the database says it's been run already.
 39+TEXT;
 40+ exit( 0 );
 41+}
 42+
 43+$defaults = array(
 44+ 'begin' => '',
 45+ 'max-slave-length' => 10,
 46+ 'throttle' => 0,
 47+ 'force' => false
 48+);
 49+$options = array_merge( $defaults, $options );
 50+
 51+populateCategory( $options['begin'], $options['max-slave-length'],
 52+ $options['throttle'], $options['force'] );
Property changes on: trunk/phase3/maintenance/populateCategory.php
___________________________________________________________________
Added: svn:eol-style
153 + native
Index: trunk/phase3/maintenance/updaters.inc
@@ -133,6 +133,8 @@
134134 array( 'add_field', 'ipblocks', 'ipb_by_text', 'patch-ipb_by_text.sql' ),
135135 array( 'add_table', 'page_props', 'patch-page_props.sql' ),
136136 array( 'add_table', 'updatelog', 'patch-updatelog.sql' ),
 137+ array( 'add_table', 'category', 'patch-category.sql' ),
 138+ array( 'do_category_population' ),
137139 );
138140
139141
@@ -1135,6 +1137,20 @@
11361138 }
11371139 }
11381140
 1141+function do_category_population() {
 1142+ if( update_row_exists( 'populate category' ) ) {
 1143+ echo "...category table already populated.\n";
 1144+ return;
 1145+ }
 1146+ require_once( 'populateCategory.inc' );
 1147+ echo "Populating category table, printing progress markers. ".
 1148+"For large databases, you\n".
 1149+"may want to hit Ctrl-C and do this manually with maintenance/\n".
 1150+"populateCategory.php.\n";
 1151+ populateCategory( '', 10, 0, true );
 1152+ echo "Done populating category table.\n";
 1153+}
 1154+
11391155 function
11401156 pg_describe_table($table)
11411157 {
Index: trunk/phase3/maintenance/tables.sql
@@ -486,7 +486,40 @@
487487
488488 ) /*$wgDBTableOptions*/;
489489
 490+--
 491+-- Track all existing categories. Something is a category if 1) it has an en-
 492+-- try somewhere in categorylinks, or 2) it once did. Categories might not
 493+-- have corresponding pages, so they need to be tracked separately.
490494 --
 495+CREATE TABLE /*$wgDBprefix*/category (
 496+ -- Primary key
 497+ cat_id int unsigned NOT NULL auto_increment,
 498+
 499+ -- Name of the category, in the same form as page_title (with underscores).
 500+ -- If there is a category page corresponding to this category, by definition,
 501+ -- it has this name (in the Category namespace).
 502+ cat_title varchar(255) binary NOT NULL,
 503+
 504+ -- The numbers of member pages (including categories and media), subcatego-
 505+ -- ries, and Image: namespace members, respectively. These are signed to
 506+ -- make underflow more obvious. We make the first number include the second
 507+ -- two for better sorting: subtracting for display is easy, adding for order-
 508+ -- ing is not.
 509+ cat_pages int signed NOT NULL default 0,
 510+ cat_subcats int signed NOT NULL default 0,
 511+ cat_files int signed NOT NULL default 0,
 512+
 513+ -- Should the category be hidden from article views?
 514+ cat_hidden tinyint(1) unsigned NOT NULL default 0,
 515+
 516+ PRIMARY KEY (cat_id),
 517+ UNIQUE KEY (cat_title),
 518+
 519+ -- For Special:Mostlinkedcategories
 520+ KEY (cat_pages)
 521+) /*$wgDBTableOptions*/;
 522+
 523+--
491524 -- Track links to external URLs
492525 --
493526 CREATE TABLE /*$wgDBprefix*/externallinks (
Index: trunk/phase3/includes/CategoryPage.php
@@ -70,6 +70,8 @@
7171 $children, $children_start_char,
7272 $showGallery, $gallery,
7373 $skin;
 74+ /** Category object for this page */
 75+ private $cat;
7476
7577 function __construct( $title, $from = '', $until = '' ) {
7678 global $wgCategoryPagingLimit;
@@ -77,6 +79,7 @@
7880 $this->from = $from;
7981 $this->until = $until;
8082 $this->limit = $wgCategoryPagingLimit;
 83+ $this->cat = Category::newFromName( $title->getDBKey() );
8184 }
8285
8386 /**
@@ -261,12 +264,14 @@
262265 function getSubcategorySection() {
263266 # Don't show subcategories section if there are none.
264267 $r = '';
265 - $c = count( $this->children );
266 - if( $c > 0 ) {
 268+ $rescnt = count( $this->children );
 269+ $dbcnt = $this->cat->getSubcatCount();
 270+ $countmsg = $this->getCountMessage( $rescnt, $dbcnt, 'subcat' );
 271+ if( $rescnt > 0 ) {
267272 # Showing subcategories
268273 $r .= "<div id=\"mw-subcategories\">\n";
269274 $r .= '<h2>' . wfMsg( 'subcategories' ) . "</h2>\n";
270 - $r .= wfMsgExt( 'subcategorycount', array( 'parse' ), $c );
 275+ $r .= $countmsg;
271276 $r .= $this->formatList( $this->children, $this->children_start_char );
272277 $r .= "\n</div>";
273278 }
@@ -277,11 +282,20 @@
278283 $ti = htmlspecialchars( $this->title->getText() );
279284 # Don't show articles section if there are none.
280285 $r = '';
281 - $c = count( $this->articles );
282 - if( $c > 0 ) {
 286+
 287+ # FIXME, here and in the other two sections: we don't need to bother
 288+ # with this rigamarole if the entire category contents fit on one page
 289+ # and have already been retrieved. We can just use $rescnt in that
 290+ # case and save a query and some logic.
 291+ $dbcnt = $this->cat->getPageCount() - $this->cat->getSubcatCount()
 292+ - $this->cat->getFileCount();
 293+ $rescnt = count( $this->articles );
 294+ $countmsg = $this->getCountMessage( $rescnt, $dbcnt, 'article' );
 295+
 296+ if( $rescnt > 0 ) {
283297 $r = "<div id=\"mw-pages\">\n";
284298 $r .= '<h2>' . wfMsg( 'category_header', $ti ) . "</h2>\n";
285 - $r .= wfMsgExt( 'categoryarticlecount', array( 'parse' ), $c );
 299+ $r .= $countmsg;
286300 $r .= $this->formatList( $this->articles, $this->articles_start_char );
287301 $r .= "\n</div>";
288302 }
@@ -290,10 +304,13 @@
291305
292306 function getImageSection() {
293307 if( $this->showGallery && ! $this->gallery->isEmpty() ) {
 308+ $dbcnt = $this->cat->getFileCount();
 309+ $rescnt = $this->gallery->count();
 310+ $countmsg = $this->getCountMessage( $rescnt, $dbcnt, 'file' );
 311+
294312 return "<div id=\"mw-category-media\">\n" .
295313 '<h2>' . wfMsg( 'category-media-header', htmlspecialchars($this->title->getText()) ) . "</h2>\n" .
296 - wfMsgExt( 'category-media-count', array( 'parse' ), $this->gallery->count() ) .
297 - $this->gallery->toHTML() . "\n</div>";
 314+ $countmsg . $this->gallery->toHTML() . "\n</div>";
298315 } else {
299316 return '';
300317 }
@@ -440,6 +457,47 @@
441458
442459 return "($prevLink) ($nextLink)";
443460 }
 461+
 462+ /**
 463+ * What to do if the category table conflicts with the number of results
 464+ * returned? This function says what. It works the same whether the
 465+ * things being counted are articles, subcategories, or files.
 466+ *
 467+ * Note for grepping: uses the messages category-article-count,
 468+ * category-article-count-limited, category-subcat-count,
 469+ * category-subcat-count-limited, category-file-count,
 470+ * category-file-count-limited.
 471+ *
 472+ * @param int $rescnt The number of items returned by our database query.
 473+ * @param int $dbcnt The number of items according to the category table.
 474+ * @param string $type 'subcat', 'article', or 'file'
 475+ * @return string A message giving the number of items, to output to HTML.
 476+ */
 477+ private function getCountMessage( $rescnt, $dbcnt, $type ) {
 478+ # There are three cases:
 479+ # 1) The category table figure seems sane. It might be wrong, but
 480+ # we can't do anything about it if we don't recalculate it on ev-
 481+ # ery category view.
 482+ # 2) The category table figure isn't sane, like it's smaller than the
 483+ # number of actual results, *but* the number of results is less
 484+ # than $this->limit and there's no offset. In this case we still
 485+ # know the right figure.
 486+ # 3) We have no idea.
 487+ $totalrescnt = count( $this->articles ) + count( $this->children ) +
 488+ $this->gallery->count();
 489+ if($dbcnt == $rescnt || (($totalrescnt == $this->limit || $this->from
 490+ || $this->until) && $dbcnt > $rescnt)){
 491+ # Case 1: seems sane.
 492+ $totalcnt = $dbcnt;
 493+ } elseif($totalrescnt < $this->limit && !$this->from && !$this->until){
 494+ # Case 2: not sane, but salvageable.
 495+ $totalcnt = $rescnt;
 496+ } else {
 497+ # Case 3: hopeless. Don't give a total count at all.
 498+ return wfMsgExt("category-$type-count-limited", 'parse', $rescnt);
 499+ }
 500+ return wfMsgExt( "category-$type-count", 'parse', $rescnt, $totalcnt );
 501+ }
444502 }
445503
446504
Index: trunk/phase3/includes/Article.php
@@ -2259,12 +2259,20 @@
22602260 # Delete restrictions for it
22612261 $dbw->delete( 'page_restrictions', array ( 'pr_page' => $id ), __METHOD__ );
22622262
 2263+ # Fix category table counts
 2264+ $cats = array();
 2265+ $res = $dbw->select( 'categorylinks', 'cl_to',
 2266+ array( 'cl_from' => $id ), __METHOD__ );
 2267+ foreach( $res as $row ) {
 2268+ $cats []= $row->cl_to;
 2269+ }
 2270+ $this->updateCategoryCounts( array(), $cats, $dbw );
 2271+
22632272 # Now that it's safely backed up, delete it
22642273 $dbw->delete( 'page', array( 'page_id' => $id ), __METHOD__);
22652274
22662275 # If using cascading deletes, we can skip some explicit deletes
22672276 if ( !$dbw->cascadingDeletes() ) {
2268 -
22692277 $dbw->delete( 'revision', array( 'rev_page' => $id ), __METHOD__ );
22702278
22712279 if ($wgUseTrackbacks)
@@ -3340,4 +3348,55 @@
33413349 $wgOut->addParserOutput( $parserOutput );
33423350 }
33433351
 3352+ /**
 3353+ * Update all the appropriate counts in the category table, given that
 3354+ * we've added the categories $added and deleted the categories $deleted.
 3355+ *
 3356+ * @param $added array The names of categories that were added
 3357+ * @param $deleted array The names of categories that were deleted
 3358+ * @param $dbw Database Optional database connection to use
 3359+ * @return null
 3360+ */
 3361+ public function updateCategoryCounts( $added, $deleted, $dbw = null ) {
 3362+ $ns = $this->mTitle->getNamespace();
 3363+ if( !$dbw ) {
 3364+ $dbw = wfGetDB( DB_MASTER );
 3365+ }
 3366+
 3367+ # First make sure the rows exist. If one of the "deleted" ones didn't
 3368+ # exist, we might legitimately not create it, but it's simpler to just
 3369+ # create it and then give it a negative value, since the value is bogus
 3370+ # anyway.
 3371+ #
 3372+ # Sometimes I wish we had INSERT ... ON DUPLICATE KEY UPDATE.
 3373+ $insertCats = array_merge( $added, $deleted );
 3374+ $insertRows = array();
 3375+ foreach( $insertCats as $cat ) {
 3376+ $insertRows []= array( 'cat_title' => $cat );
 3377+ }
 3378+ $dbw->insert( 'category', $insertRows, __METHOD__, 'IGNORE' );
 3379+
 3380+ $addFields = array( 'cat_pages = cat_pages + 1' );
 3381+ $removeFields = array( 'cat_pages = cat_pages - 1' );
 3382+ if( $ns == NS_CATEGORY ) {
 3383+ $addFields []= 'cat_subcats = cat_subcats + 1';
 3384+ $removeFields []= 'cat_subcats = cat_subcats - 1';
 3385+ } elseif( $ns == NS_IMAGE ) {
 3386+ $addFields []= 'cat_files = cat_files + 1';
 3387+ $removeFields []= 'cat_files = cat_files - 1';
 3388+ }
 3389+
 3390+ $dbw->update(
 3391+ 'category',
 3392+ $addFields,
 3393+ array( 'cat_title' => $added ),
 3394+ __METHOD__
 3395+ );
 3396+ $dbw->update(
 3397+ 'category',
 3398+ $removeFields,
 3399+ array( 'cat_title' => $deleted ),
 3400+ __METHOD__
 3401+ );
 3402+ }
33443403 }
Index: trunk/phase3/includes/LinksUpdate.php
@@ -124,8 +124,11 @@
125125 $this->getCategoryInsertions( $existing ) );
126126
127127 # Invalidate all categories which were added, deleted or changed (set symmetric difference)
128 - $categoryUpdates = array_diff_assoc( $existing, $this->mCategories ) + array_diff_assoc( $this->mCategories, $existing );
 128+ $categoryInserts = array_diff_assoc( $this->mCategories, $existing );
 129+ $categoryDeletes = array_diff_assoc( $existing, $this->mCategories );
 130+ $categoryUpdates = $categoryInserts + $categoryDeletes;
129131 $this->invalidateCategories( $categoryUpdates );
 132+ $this->updateCategoryCounts( $categoryInserts, $categoryDeletes );
130133
131134 # Page properties
132135 $existing = $this->getExistingProperties();
@@ -155,7 +158,9 @@
156159
157160 # Refresh category pages and image description pages
158161 $existing = $this->getExistingCategories();
159 - $categoryUpdates = array_diff_assoc( $existing, $this->mCategories ) + array_diff_assoc( $this->mCategories, $existing );
 162+ $categoryInserts = array_diff_assoc( $this->mCategories, $existing );
 163+ $categoryDeletes = array_diff_assoc( $existing, $this->mCategoties );
 164+ $categoryUpdates = $categoryInserts + $categoryDeletes;
160165 $existing = $this->getExistingImages();
161166 $imageUpdates = array_diff_key( $existing, $this->mImages ) + array_diff_key( $this->mImages, $existing );
162167
@@ -167,8 +172,10 @@
168173 $this->dumbTableUpdate( 'langlinks', $this->getInterlangInsertions(),'ll_from' );
169174 $this->dumbTableUpdate( 'page_props', $this->getPropertyInsertions(), 'pp_page' );
170175
171 - # Update the cache of all the category pages and image description pages which were changed
 176+ # Update the cache of all the category pages and image description
 177+ # pages which were changed, and fix the category table count
172178 $this->invalidateCategories( $categoryUpdates );
 179+ $this->updateCategoryCounts( $categoryInserts, $categoryDeletes );
173180 $this->invalidateImageDescriptions( $imageUpdates );
174181
175182 # Refresh links of all pages including this page
@@ -261,6 +268,18 @@
262269 $this->invalidatePages( NS_CATEGORY, array_keys( $cats ) );
263270 }
264271
 272+ /**
 273+ * Update all the appropriate counts in the category table.
 274+ * @param $added associative array of category name => sort key
 275+ * @param $deleted associative array of category name => sort key
 276+ */
 277+ function updateCategoryCounts( $added, $deleted ) {
 278+ $a = new Article($this->mTitle);
 279+ $a->updateCategoryCounts(
 280+ array_keys( $added ), array_keys( $deleted ), $this->mDb
 281+ );
 282+ }
 283+
265284 function invalidateImageDescriptions( $images ) {
266285 $this->invalidatePages( NS_IMAGE, array_keys( $images ) );
267286 }
@@ -268,9 +287,9 @@
269288 function dumbTableUpdate( $table, $insertions, $fromField ) {
270289 $this->mDb->delete( $table, array( $fromField => $this->mId ), __METHOD__ );
271290 if ( count( $insertions ) ) {
272 - # The link array was constructed without FOR UPDATE, so there may be collisions
273 - # This may cause minor link table inconsistencies, which is better than
274 - # crippling the site with lock contention.
 291+ # The link array was constructed without FOR UPDATE, so there may
 292+ # be collisions. This may cause minor link table inconsistencies,
 293+ # which is better than crippling the site with lock contention.
275294 $this->mDb->insert( $table, $insertions, __METHOD__, array( 'IGNORE' ) );
276295 }
277296 }
Index: trunk/phase3/includes/AutoLoader.php
@@ -25,7 +25,9 @@
2626 'BagOStuff' => 'includes/BagOStuff.php',
2727 'Block' => 'includes/Block.php',
2828 'BrokenRedirectsPage' => 'includes/SpecialBrokenRedirects.php',
 29+ 'Category' => 'includes/Category.php',
2930 'Categoryfinder' => 'includes/Categoryfinder.php',
 31+ 'CategoryList' => 'includes/Category.php',
3032 'CategoryPage' => 'includes/CategoryPage.php',
3133 'CategoryViewer' => 'includes/CategoryPage.php',
3234 'ChangesList' => 'includes/ChangesList.php',
Index: trunk/phase3/includes/Category.php
@@ -0,0 +1,305 @@
 2+<?php
 3+/**
 4+ * Two classes, Category and CategoryList, to deal with categories. To reduce
 5+ * code duplication, most of the logic is implemented for lists of categories,
 6+ * and then single categories are a special case. We use a separate class for
 7+ * CategoryList so as to discourage stupid slow memory-hogging stuff like manu-
 8+ * ally iterating through arrays of Titles and Articles, which we do way too
 9+ * much, when a smarter class can do stuff all in one query.
 10+ *
 11+ * Category(List) objects are immutable, strictly speaking. If you call me-
 12+ * thods that change the database, like to refresh link counts, the objects
 13+ * will be appropriately reinitialized. Member variables are lazy-initialized.
 14+ *
 15+ * TODO: Move some stuff from CategoryPage.php to here, and use that.
 16+ *
 17+ * @author Simetrical
 18+ */
 19+
 20+abstract class CategoryListBase {
 21+ # FIXME: Is storing all member variables as simple arrays a good idea?
 22+ # Should we use some kind of associative array instead?
 23+ /** Names of all member categories, normalized to DB-key form */
 24+ protected $mNames = null;
 25+ /** IDs of all member categories */
 26+ protected $mIDs = null;
 27+ /**
 28+ * Counts of membership (cat_pages, cat_subcats, cat_files) for all member
 29+ * categories
 30+ */
 31+ protected $mPages = null, $mSubcats = null, $mFiles = null;
 32+
 33+ protected function __construct() {}
 34+
 35+ /** See CategoryList::newFromNames for details. */
 36+ protected function setNames( $names ) {
 37+ if( !is_array( $names ) ) {
 38+ throw new MWException( __METHOD__.' passed non-array' );
 39+ }
 40+ $this->mNames = array_diff(
 41+ array_map(
 42+ array( 'CategoryListBase', 'setNamesCallback' ),
 43+ $names
 44+ ),
 45+ array( false )
 46+ );
 47+ }
 48+
 49+ /**
 50+ * @param string $name Name of a putative category
 51+ * @return mixed Normalized name, or false if the name was invalid.
 52+ */
 53+ private static function setNamesCallback( $name ) {
 54+ $title = Title::newFromText( $name );
 55+ if( !is_object( $title ) ) {
 56+ return false;
 57+ }
 58+ return $title->getDBKey();
 59+ }
 60+
 61+ /**
 62+ * Set up all member variables using a database query.
 63+ * @return bool True on success, false on failure.
 64+ */
 65+ protected function initialize() {
 66+ if( $this->mNames === null && $this->mIDs === null ) {
 67+ throw new MWException( __METHOD__.' has both names and IDs null' );
 68+ }
 69+ $dbr = wfGetDB( DB_SLAVE );
 70+ if( $this->mIDs === null ) {
 71+ $where = array( 'cat_title' => $this->mNames );
 72+ } elseif( $this->mNames === null ) {
 73+ $where = array( 'cat_id' => $this->mIDs );
 74+ } else {
 75+ # Already initialized
 76+ return true;
 77+ }
 78+ $res = $dbr->select(
 79+ 'category',
 80+ array( 'cat_id', 'cat_title', 'cat_pages', 'cat_subcats',
 81+ 'cat_files' ),
 82+ $where,
 83+ __METHOD__
 84+ );
 85+ if( !$res->fetchRow() ) {
 86+ # Okay, there were no contents. Nothing to initialize.
 87+ return false;
 88+ }
 89+ $res->rewind();
 90+ $this->mIDs = $this->mNames = $this->mPages = $this->mSubcats =
 91+ $this->mFiles = array();
 92+ while( $row = $res->fetchRow() ) {
 93+ $this->mIDs []= $row['cat_id'];
 94+ $this->mNames []= $row['cat_title'];
 95+ $this->mPages []= $row['cat_pages'];
 96+ $this->mSubcats []= $row['cat_subcats'];
 97+ $this->mFiles []= $row['cat_files'];
 98+ }
 99+ $res->free();
 100+ }
 101+}
 102+
 103+/** @todo make iterable. */
 104+class CategoryList extends CategoryListBase {
 105+ /**
 106+ * Factory function. Any provided elements that don't correspond to a cat-
 107+ * egory that actually exists will be silently dropped. FIXME: Is this
 108+ * sane error-handling?
 109+ *
 110+ * @param array $names An array of category names. They need not be norma-
 111+ * lized, with spaces replaced by underscores.
 112+ * @return CategoryList
 113+ */
 114+ public static function newFromNames( $names ) {
 115+ $cat = new self();
 116+ $cat->setNames( $names );
 117+ return $cat;
 118+ }
 119+
 120+ /**
 121+ * Factory function. Any provided elements that don't correspond to a cat-
 122+ * egory that actually exists will be silently dropped. FIXME: Is this
 123+ * sane error-handling?
 124+ *
 125+ * @param array $ids An array of category ids
 126+ * @return CategoryList
 127+ */
 128+ public static function newFromIDs( $ids ) {
 129+ if( !is_array( $ids ) ) {
 130+ throw new MWException( __METHOD__.' passed non-array' );
 131+ }
 132+ $cat = new self();
 133+ $cat->mIds = $ids;
 134+ return $cat;
 135+ }
 136+
 137+ /** @return array Simple array of DB key names */
 138+ public function getNames() {
 139+ $this->initialize();
 140+ return $this->mNames;
 141+ }
 142+ /**
 143+ * FIXME: Is this a good return type?
 144+ *
 145+ * @return array Associative array of DB key name => ID
 146+ */
 147+ public function getIDs() {
 148+ $this->initialize();
 149+ return array_fill_keys( $this->mNames, $this->mIDs );
 150+ }
 151+ /**
 152+ * FIXME: Is this a good return type?
 153+ *
 154+ * @return array Associative array of DB key name => array(pages, subcats,
 155+ * files)
 156+ */
 157+ public function getCounts() {
 158+ $this->initialize();
 159+ $ret = array();
 160+ foreach( array_keys( $this->mNames ) as $i ) {
 161+ $ret[$this->mNames[$i]] = array(
 162+ $this->mPages[$i],
 163+ $this->mSubcats[$i],
 164+ $this->mFiles[$i]
 165+ );
 166+ }
 167+ return $ret;
 168+ }
 169+}
 170+
 171+class Category extends CategoryListBase {
 172+ /**
 173+ * Factory function.
 174+ *
 175+ * @param array $name A category name (no "Category:" prefix). It need
 176+ * not be normalized, with spaces replaced by underscores.
 177+ * @return mixed Category, or false on a totally invalid name
 178+ */
 179+ public static function newFromName( $name ) {
 180+ $cat = new self();
 181+ $cat->setNames( array( $name ) );
 182+ if( count( $cat->mNames ) !== 1 ) {
 183+ return false;
 184+ }
 185+ return $cat;
 186+ }
 187+
 188+ /**
 189+ * Factory function.
 190+ *
 191+ * @param array $id A category id
 192+ * @return Category
 193+ */
 194+ public static function newFromIDs( $id ) {
 195+ $cat = new self();
 196+ $cat->mIDs = array( $id );
 197+ return $cat;
 198+ }
 199+
 200+ /** @return mixed DB key name, or false on failure */
 201+ public function getName() { return $this->getX( 'mNames' ); }
 202+ /** @return mixed Category ID, or false on failure */
 203+ public function getID() { return $this->getX( 'mIDs' ); }
 204+ /** @return mixed Total number of member pages, or false on failure */
 205+ public function getPageCount() { return $this->getX( 'mPages' ); }
 206+ /** @return mixed Number of subcategories, or false on failure */
 207+ public function getSubcatCount() { return $this->getX( 'mSubcats' ); }
 208+ /** @return mixed Number of member files, or false on failure */
 209+ public function getFileCount() { return $this->getX( 'mFiles' ); }
 210+ /**
 211+ * This is not implemented in the base class, because arrays of Titles are
 212+ * evil.
 213+ *
 214+ * @return mixed The Title for this category, or false on failure.
 215+ */
 216+ public function getTitle() {
 217+ if( !$this->initialize() ) {
 218+ return false;
 219+ }
 220+ # FIXME is there a better way to do this?
 221+ return Title::newFromText( "Category:{$this->mNames[0]}" );
 222+ }
 223+
 224+ /** Generic accessor */
 225+ private function getX( $key ) {
 226+ if( !$this->initialize() ) {
 227+ return false;
 228+ }
 229+ return $this->{$key}[0];
 230+ }
 231+
 232+ /**
 233+ * Override the parent class so that we can return false if things muck
 234+ * up, i.e., the name/ID we got was invalid. Currently CategoryList si-
 235+ * lently eats errors so as not to kill the whole array for one bad name.
 236+ *
 237+ * @return bool True on success, false on failure.
 238+ */
 239+ protected function initialize() {
 240+ parent::initialize();
 241+ if( count( $this->mNames ) != 1 || count( $this->mIDs ) != 1 ) {
 242+ return false;
 243+ }
 244+ return true;
 245+ }
 246+
 247+ /**
 248+ * Refresh the counts for this category.
 249+ *
 250+ * FIXME: If there were some way to do this in MySQL 4 without an UPDATE
 251+ * for every row, it would be nice to move this to the parent class.
 252+ *
 253+ * @return bool True on success, false on failure
 254+ */
 255+ public function refreshCounts() {
 256+ if( wfReadOnly() ) {
 257+ return false;
 258+ }
 259+ $dbw = wfGetDB( DB_MASTER );
 260+ $dbw->begin();
 261+ # Note, we must use names for this, since categorylinks does.
 262+ if( $this->mNames === null ) {
 263+ if( !$this->initialize() ) {
 264+ return false;
 265+ }
 266+ } else {
 267+ # Let's be sure that the row exists in the table. We don't need to
 268+ # do this if we got the row from the table in initialization!
 269+ $dbw->insert(
 270+ 'category',
 271+ array( 'cat_title' => $this->mNames[0] ),
 272+ __METHOD__,
 273+ 'IGNORE'
 274+ );
 275+ }
 276+
 277+ $result = $dbw->selectRow(
 278+ array( 'categorylinks', 'page' ),
 279+ array( 'COUNT(*) AS pages',
 280+ 'COUNT(IF(page_namespace='.NS_CATEGORY.',1,NULL)) AS subcats',
 281+ 'COUNT(IF(page_namespace='.NS_IMAGE.',1,NULL)) AS files'
 282+ ),
 283+ array( 'cl_to' => $this->mNames[0], 'page_id = cl_from' ),
 284+ __METHOD__,
 285+ 'LOCK IN SHARE MODE'
 286+ );
 287+ $ret = $dbw->update(
 288+ 'category',
 289+ array(
 290+ 'cat_pages' => $result->pages,
 291+ 'cat_subcats' => $result->subcats,
 292+ 'cat_files' => $result->files
 293+ ),
 294+ array( 'cat_title' => $this->mNames[0] ),
 295+ __METHOD__
 296+ );
 297+ $dbw->commit();
 298+
 299+ # Now we should update our local counts.
 300+ $this->mPages = array( $result->pages );
 301+ $this->mSubcats = array( $result->subcats );
 302+ $this->mFiles = array( $result->files );
 303+
 304+ return $ret;
 305+ }
 306+}
Property changes on: trunk/phase3/includes/Category.php
___________________________________________________________________
Added: svn:eol-style
1307 + native
Index: trunk/phase3/languages/messages/MessagesEn.php
@@ -2416,16 +2416,20 @@
24172417 'nocredits' => 'There is no credits info available for this page.',
24182418
24192419 # Spam protection
2420 -'spamprotectiontitle' => 'Spam protection filter',
2421 -'spamprotectiontext' => 'The page you wanted to save was blocked by the spam filter. This is probably caused by a link to an external site.',
2422 -'spamprotectionmatch' => 'The following text is what triggered our spam filter: $1',
2423 -'subcategorycount' => 'There {{PLURAL:$1|is one subcategory|are $1 subcategories}} to this category.',
2424 -'categoryarticlecount' => 'There {{PLURAL:$1|is one page|are $1 pages}} in this category.',
2425 -'category-media-count' => 'There {{PLURAL:$1|is one file|are $1 files}} in this category.',
2426 -'listingcontinuesabbrev' => 'cont.',
2427 -'spambot_username' => 'MediaWiki spam cleanup',
2428 -'spam_reverting' => 'Reverting to last version not containing links to $1',
2429 -'spam_blanking' => 'All revisions contained links to $1, blanking',
 2420+'spamprotectiontitle' => 'Spam protection filter',
 2421+'spamprotectiontext' => 'The page you wanted to save was blocked by the spam filter. This is probably caused by a link to an external site.',
 2422+'spamprotectionmatch' => 'The following text is what triggered our spam filter: $1',
 2423+'subcategorycount' => 'There {{PLURAL:$1|is one subcategory|are $1 subcategories}} to this category.',
 2424+'category-subcat-count' => '{{PLURAL:$2|This category has only the following subcategory.|This category has the following {{PLURAL:$1|subcategory|$1 subcategories}}, out of $2 total.}}',
 2425+'category-subcat-count-limited' => 'This category has the following {{PLURAL:$1|subcategory|$1 subcategories}}.',
 2426+'category-article-count' => '{{PLURAL:$2|This category contains only the following page.|The following {{PLURAL:$1|page is|$1 pages are}} in this category, out of $2 total.}}',
 2427+'category-article-count-limited' => 'The following {{PLURAL:$1|page is|$1 pages are}} in the current category.',
 2428+'category-media-count' => '{{PLURAL:$2|This category contains only the following file.|The following {{PLURAL:$1|file is|$1 files are}} in this category, out of $2 total.}}',
 2429+'category-media-count-limited' => 'The following {{PLURAL:$1|file is|$1 files are}} in the current category.',
 2430+'listingcontinuesabbrev' => 'cont.',
 2431+'spambot_username' => 'MediaWiki spam cleanup',
 2432+'spam_reverting' => 'Reverting to last version not containing links to $1',
 2433+'spam_blanking' => 'All revisions contained links to $1, blanking',
24302434
24312435 # Info page
24322436 'infosubtitle' => 'Information for page',
Index: trunk/phase3/RELEASE-NOTES
@@ -46,6 +46,8 @@
4747 link on diffs
4848 * Magic word formatnum can now take raw suffix to undo formatting
4949 * Add updatelog table to reliably permit updates that don't change the schema
 50+* Add category table to allow better tracking of category membership counts
 51+** (bug 1212) Give correct membership counts on the pages of large categories
5052
5153 === Bug fixes in 1.13 ===
5254

Follow-up revisions

RevisionCommit summaryAuthorDate
r32092Fix for r32085: Use the correct message namesraymond05:57, 18 March 2008
r32093Fix for r32085:...raymond06:10, 18 March 2008
r32212Easter housekeeping:...raymond08:20, 20 March 2008
r72547Follow-up r32085. Delay the transaction begin until after the object is initi...platonides20:11, 7 September 2010

Comments

#Comment by Liamzebedee (talk | contribs)   11:50, 11 March 2013

This change introduced [https://bugzilla.wikimedia.org/show_bug.cgi?id=45650 BUG 45650] which can be traced to trunk/phase3/includes/Category.php:285.

Currently makes it unusable on SQLite-based installations.

Status & tagging log