r88113 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r88112‎ | r88113 | r88114 >
Date:17:11, 14 May 2011
Author:ialex
Status:resolved (Comments)
Tags:
Comment:
Rewrote the article counting code and related:
* (bug 26033, bug 24754) Added $wgArticleCountMethod to have a more flexible way to define which method to use to define if a page is an article or not and deprecated $wgUseCommaCount. There is now a new 'any' method to count any article that is in a content namespace and not a redirect.
* (bug 11868) If using links to count articles, Article::isCountable() will now use the ParserOutput to check if there's a link instead of checking for the "[[" string. Changed Article::isCountable() to take a stdObject or false for the first parameters. If false is passed, the result will be based on the current article's state (i.e. database). The only call outside of the Article class is in DeleteAction (including extensions).
* Removed this horror of Article::$mGoodAdjustment and Article::$mTotalAdjustment, replaced by the new $created parameter on Article::editUpdates(); simplified Article::createUpdates()
* Updated Import.php to take advantage of the new parameter and make a single call to Article::editUpdates()
Modified paths:
  • /trunk/phase3/RELEASE-NOTES-1.19 (modified) (history)
  • /trunk/phase3/includes/Article.php (modified) (history)
  • /trunk/phase3/includes/DefaultSettings.php (modified) (history)
  • /trunk/phase3/includes/Import.php (modified) (history)
  • /trunk/phase3/includes/Setup.php (modified) (history)
  • /trunk/phase3/includes/SiteStats.php (modified) (history)
  • /trunk/phase3/includes/actions/DeleteAction.php (modified) (history)

Diff [purge]

Index: trunk/phase3/RELEASE-NOTES-1.19
@@ -16,11 +16,15 @@
1717 * $wgAllowUserSkin (deprecated in 1.16) has now been removed
1818 * $wgExtraRandompageSQL (deprecated in 1.16) has now been removed
1919 * LogReader and LogViewer classes (deprecated in 1.14) have now been removed
 20+* (bug 26033) Added $wgArticleCountMethod to select the method to use to say
 21+ whether a page is an article or not. $wgUseCommaCount is now deprecated.
2022
2123 === New features in 1.19 ===
2224 * (bug 28916) A way to to toggle mw.config legacy globals settings from
2325 LocalSettings.php has been created by introducing $wgLegacyJavaScriptGlobals.
2426 * (bug 28503) Support for ircs:// URL protocols
 27+* (bug 26033) It is now possible to count all non-redirect pages in content
 28+ namespaces as articles
2529
2630 === Bug fixes in 1.19 ===
2731 * (bug 10154) Don't allow user to specify days beyond $wgRCMaxAge.
@@ -38,6 +42,8 @@
3943 * (bug 27864) Transcluding {{Special:Prefix}} with empty prefix now lists all
4044 pages.
4145 * (bug 18803) JPEG2000 images can no longer be uploaded as JPEG image.
 46+* (bug 11868) If using links to count articles, the checking will now be based
 47+ on the real presence of an internal link instead of the "[[" string
4248
4349 === API changes in 1.19 ===
4450 * (bug 27790) add query type for querymodules to action=paraminfo
Index: trunk/phase3/includes/DefaultSettings.php
@@ -2974,15 +2974,31 @@
29752975 */
29762976
29772977 /**
2978 - * Under which condition should a page in the main namespace be counted
2979 - * as a valid article? If $wgUseCommaCount is set to true, it will be
2980 - * counted if it contains at least one comma. If it is set to false
2981 - * (default), it will only be counted if it contains at least one [[wiki
2982 - * link]]. See http://www.mediawiki.org/wiki/Manual:Article_count
 2978+ * Method used to determine if a page in a content namespace should be counted
 2979+ * as a valid article.
29832980 *
2984 - * Retroactively changing this variable will not affect
2985 - * the existing count (cf. maintenance/recount.sql).
 2981+ * Redirect pages will never be counted as valid articles.
 2982+ *
 2983+ * This variable can have the following values:
 2984+ * - 'any': all pages as considered as valid articles
 2985+ * - 'comma': the page must contain a comma to be considered valid
 2986+ * - 'link': the page must contain a [[wiki link]] to be considered valid
 2987+ * - null: the value will be set at run time depending on $wgUseCommaCount:
 2988+ * if $wgUseCommaCount is false, it will be 'link', if it is true
 2989+ * it will be 'comma'
 2990+ *
 2991+ * See also See http://www.mediawiki.org/wiki/Manual:Article_count
 2992+ *
 2993+ * Retroactively changing this variable will not affect the existing count,
 2994+ * to update it, you will need to run the maintenance/updateArticleCount.php
 2995+ * script.
29862996 */
 2997+$wgArticleCountMethod = null;
 2998+
 2999+/**
 3000+ * Backward compatibility setting, will set $wgArticleCountMethod if it is null.
 3001+ * @deprecated in 1.19; use $wgArticleCountMethod instead
 3002+ */
29873003 $wgUseCommaCount = false;
29883004
29893005 /**
Index: trunk/phase3/includes/Import.php
@@ -1027,25 +1027,25 @@
10281028 $tempTitle = $GLOBALS['wgTitle'];
10291029 $GLOBALS['wgTitle'] = $this->title;
10301030
1031 - if( $created ) {
 1031+ if ( $created ) {
10321032 wfDebug( __METHOD__ . ": running onArticleCreate\n" );
10331033 Article::onArticleCreate( $this->title );
1034 -
1035 - wfDebug( __METHOD__ . ": running create updates\n" );
1036 - $article->createUpdates( $revision );
1037 -
10381034 } elseif( $changed ) {
10391035 wfDebug( __METHOD__ . ": running onArticleEdit\n" );
10401036 Article::onArticleEdit( $this->title );
 1037+ }
10411038
1042 - wfDebug( __METHOD__ . ": running edit updates\n" );
1043 - $article->editUpdates(
1044 - $this->getText(),
1045 - $this->getComment(),
1046 - $this->minor,
1047 - $this->timestamp,
1048 - $revId );
1049 - }
 1039+ wfDebug( __METHOD__ . ": running updates\n" );
 1040+ $article->editUpdates(
 1041+ $this->getText(),
 1042+ $this->getComment(),
 1043+ $this->minor,
 1044+ $this->timestamp,
 1045+ $revId,
 1046+ true,
 1047+ null,
 1048+ $created );
 1049+
10501050 $GLOBALS['wgTitle'] = $tempTitle;
10511051
10521052 return true;
Index: trunk/phase3/includes/Article.php
@@ -27,7 +27,6 @@
2828 var $mContentLoaded = false; // !<
2929 var $mCounter = -1; // !< Not loaded
3030 var $mDataLoaded = false; // !<
31 - var $mGoodAdjustment = 0; // !<
3231 var $mIsRedirect = false; // !<
3332 var $mLatest = false; // !<
3433 var $mOldId; // !<
@@ -61,7 +60,6 @@
6261
6362 var $mTimestamp = ''; // !<
6463 var $mTitle; // !< Title object
65 - var $mTotalAdjustment = 0; // !<
6664 var $mTouched = '19700101000000'; // !<
6765
6866 /**
@@ -260,7 +258,6 @@
261259 $this->mRedirectTarget = null; # Title object if set
262260 $this->mLastRevision = null; # Latest revision
263261 $this->mTimestamp = '';
264 - $this->mGoodAdjustment = $this->mTotalAdjustment = 0;
265262 $this->mTouched = '19700101000000';
266263 $this->mIsRedirect = false;
267264 $this->mRevIdFetched = 0;
@@ -644,15 +641,43 @@
645642 * Determine whether a page would be suitable for being counted as an
646643 * article in the site_stats table based on the title & its content
647644 *
648 - * @param $text String: text to analyze
649 - * @return bool
 645+ * @param $editInfo Object or false: object returned by prepareTextForEdit(),
 646+ * if false, the current database state will be used
 647+ * @return Boolean
650648 */
651 - public function isCountable( $text ) {
652 - global $wgUseCommaCount;
 649+ public function isCountable( $editInfo = false ) {
 650+ global $wgArticleCountMethod;
653651
654 - $token = $wgUseCommaCount ? ',' : '[[';
 652+ if ( !$this->mTitle->isContentPage() ) {
 653+ return false;
 654+ }
655655
656 - return $this->mTitle->isContentPage() && !$this->isRedirect( $text ) && in_string( $token, $text );
 656+ $text = $editInfo ? $editInfo->pst : false;
 657+
 658+ if ( $this->isRedirect( $text ) ) {
 659+ return false;
 660+ }
 661+
 662+ switch ( $wgArticleCountMethod ) {
 663+ case 'any':
 664+ return true;
 665+ case 'comma':
 666+ if ( $text === false ) {
 667+ $text = $this->getRawText();
 668+ }
 669+ return in_string( ',', $text );
 670+ case 'link':
 671+ if ( $editInfo ) {
 672+ // ParserOutput::getLinks() is a 2D array of page links, so
 673+ // to be really correct we would need to recurse in the array
 674+ // but the main array should only have items in it if there are
 675+ // links.
 676+ return (bool)count( $editInfo->output->getLinks() );
 677+ } else {
 678+ return (bool)wfGetDB( DB_SLAVE )->selectField( 'pagelinks', 1,
 679+ array( 'pl_from' => $this->getId() ), __METHOD__ );
 680+ }
 681+ }
657682 }
658683
659684 /**
@@ -2067,10 +2092,6 @@
20682093 $changed = ( strcmp( $text, $oldtext ) != 0 );
20692094
20702095 if ( $changed ) {
2071 - $this->mGoodAdjustment = (int)$this->isCountable( $text )
2072 - - (int)$this->isCountable( $oldtext );
2073 - $this->mTotalAdjustment = 0;
2074 -
20752096 if ( !$this->mLatest ) {
20762097 # Article gone missing
20772098 wfDebug( __METHOD__ . ": EDIT_UPDATE specified but article doesn't exist\n" );
@@ -2165,12 +2186,6 @@
21662187 # Create new article
21672188 $status->value['new'] = true;
21682189
2169 - # Set statistics members
2170 - # We work out if it's countable after PST to avoid counter drift
2171 - # when articles are created with {{subst:}}
2172 - $this->mGoodAdjustment = (int)$this->isCountable( $text );
2173 - $this->mTotalAdjustment = 1;
2174 -
21752190 $dbw->begin();
21762191
21772192 # Add the page record; stake our claim on this title!
@@ -2226,7 +2241,7 @@
22272242 $dbw->commit();
22282243
22292244 # Update links, etc.
2230 - $this->editUpdates( $text, $summary, $isminor, $now, $revisionId, true, $user );
 2245+ $this->editUpdates( $text, $summary, $isminor, $now, $revisionId, true, $user, true );
22312246
22322247 # Clear caches
22332248 Article::onArticleCreate( $this->mTitle );
@@ -3064,7 +3079,7 @@
30653080 return false;
30663081 }
30673082
3068 - $u = new SiteStatsUpdate( 0, 1, - (int)$this->isCountable( $this->getRawText() ), -1 );
 3083+ $u = new SiteStatsUpdate( 0, 1, - (int)$this->isCountable(), -1 );
30693084 array_push( $wgDeferredUpdateList, $u );
30703085
30713086 // Bitfields to further suppress the content
@@ -3511,8 +3526,11 @@
35123527 * @param $newid Integer: rev_id value of the new revision
35133528 * @param $changed Boolean: Whether or not the content actually changed
35143529 * @param $user User object: User doing the edit
 3530+ * @param $created Boolean: Whether the edit created the page
35153531 */
3516 - public function editUpdates( $text, $summary, $minoredit, $timestamp_of_pagechange, $newid, $changed = true, User $user = null ) {
 3532+ public function editUpdates( $text, $summary, $minoredit, $timestamp_of_pagechange, $newid,
 3533+ $changed = true, User $user = null, $created = false )
 3534+ {
35173535 global $wgDeferredUpdateList, $wgUser, $wgEnableParserCache;
35183536
35193537 wfProfileIn( __METHOD__ );
@@ -3564,11 +3582,20 @@
35653583 return;
35663584 }
35673585
3568 - $u = new SiteStatsUpdate( 0, 1, $this->mGoodAdjustment, $this->mTotalAdjustment );
3569 - array_push( $wgDeferredUpdateList, $u );
3570 - $u = new SearchUpdate( $id, $title, $text );
3571 - array_push( $wgDeferredUpdateList, $u );
 3586+ if ( !$changed ) {
 3587+ $good = 0;
 3588+ $total = 0;
 3589+ } elseif ( $created ) {
 3590+ $good = (int)$this->isCountable( $editInfo );
 3591+ $total = 1;
 3592+ } else {
 3593+ $good = (int)$this->isCountable( $editInfo ) - (int)$this->isCountable();
 3594+ $total = 0;
 3595+ }
35723596
 3597+ $wgDeferredUpdateList[] = new SiteStatsUpdate( 0, 1, $good, $total );
 3598+ $wgDeferredUpdateList[] = new SearchUpdate( $id, $title, $text );
 3599+
35733600 # If this is another user's talk page, update newtalk
35743601 # Don't do this if $changed = false otherwise some idiot can null-edit a
35753602 # load of user talk pages and piss people off, nor if it's a minor edit
@@ -3608,10 +3635,8 @@
36093636 * anymore.
36103637 */
36113638 public function createUpdates( $rev ) {
3612 - $this->mGoodAdjustment = $this->isCountable( $rev->getText() );
3613 - $this->mTotalAdjustment = 1;
36143639 $this->editUpdates( $rev->getText(), $rev->getComment(),
3615 - $rev->isMinor(), wfTimestamp(), $rev->getId(), true );
 3640+ $rev->isMinor(), wfTimestamp(), $rev->getId(), true, null, true );
36163641 }
36173642
36183643 /**
Index: trunk/phase3/includes/SiteStats.php
@@ -285,18 +285,24 @@
286286 * @return Integer
287287 */
288288 public function articles() {
289 - global $wgUseCommaCount;
 289+ global $wgArticleCountMethod;
290290
291291 $tables = array( 'page' );
292292 $conds = array(
293293 'page_namespace' => MWNamespace::getContentNamespaces(),
294294 'page_is_redirect' => 0,
295 - 'page_len > 0'
296295 );
297296
298 - if ( !$wgUseCommaCount ) {
 297+ if ( $wgArticleCountMethod == 'link' ) {
299298 $tables[] = 'pagelinks';
300299 $conds[] = 'pl_from=page_id';
 300+ } elseif ( $wgArticleCountMethod == 'comma' ) {
 301+ // To make a correct check for this, we would need, for each page,
 302+ // to load the text, maybe uncompress it, maybe decode it and then
 303+ // check if there's one comma.
 304+ // But one thing we are sure is that if the page is empty, it can't
 305+ // contain a comma :)
 306+ $conds[] = 'page_len > 0';
301307 }
302308
303309 $this->mArticles = $this->db->selectField( $tables, 'COUNT(DISTINCT page_id)',
Index: trunk/phase3/includes/Setup.php
@@ -293,6 +293,10 @@
294294 # Blacklisted file extensions shouldn't appear on the "allowed" list
295295 $wgFileExtensions = array_diff ( $wgFileExtensions, $wgFileBlacklist );
296296
 297+if ( $wgArticleCountMethod === null ) {
 298+ $wgArticleCountMethod = $wgUseCommaCount ? 'comma' : 'link';
 299+}
 300+
297301 if ( $wgInvalidateCacheOnLocalSettingsChange ) {
298302 $wgCacheEpoch = max( $wgCacheEpoch, gmdate( 'YmdHis', @filemtime( "$IP/LocalSettings.php" ) ) );
299303 }
Index: trunk/phase3/includes/actions/DeleteAction.php
@@ -245,7 +245,7 @@
246246 return false;
247247 }
248248
249 - $updates = new SiteStatsUpdate( 0, 1, - (int)$page->isCountable( $page->getRawText() ), -1 );
 249+ $updates = new SiteStatsUpdate( 0, 1, - (int)$page->isCountable(), -1 );
250250 array_push( $wgDeferredUpdateList, $updates );
251251
252252 // Bitfields to further suppress the content

Follow-up revisions

RevisionCommit summaryAuthorDate
r91180Fixes for r88113 and some realted changes:...ialex15:26, 30 June 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r53027* Refactor all of InitStats crap into SiteStatsInit class. Update all code us...demon00:27, 10 July 2009

Comments

#Comment by Nikerabbit (talk | contribs)   06:06, 15 May 2011

/me dislikes the new (and old) unreadable boolean parameters.

#Comment by Aaron Schulz (talk | contribs)   06:12, 24 June 2011

1) If you have the 'comma' counting method set, and do something like:

$a = new Article( $title_of_existing_page );
$a->doEdit( ... );

...nothing will get called that process caches the text getRawText() needs (I don't see anything that sets contentLoaded yet). Then updateRevisionOn() will get called and a COMMIT done to the DB before $this->editUpdates(). This means that page_latest is already updated (even if we didn't commit, if we happened hit the same DB, a MySQL session will "see" it's own uncommitted changes).

editUpdates() will get called and cause some lines of code to be hit:

1. $good = (int)$this->isCountable( $editInfo ) - (int)$this->isCountable();
2. $text = $this->getRawText(); [in isCountable]
3. $rev = Revision::newFromTitle( $this->mTitle ); [in getRawText]

...this will query the DB based on page_latest. This will fetch the new text, not the old text. Even when/if this gets the proper old text, it's still an extra hit (to memcached) to get the text. You should pass $oldtext (from $oldtext = $this->getRawText(); // current revision) down to the editUpdates somehow.

2) I grepped and don't see any extensions called editUpdates(). I suggest refactoring the params. I'd combine $minoredit, $created, and $changed into one $flags param using constants instead of booleans. $timestamp_of_pagechange could also be removed for being unused. If we really need b/c, then we can make a doEditUpdates() like this and leave editUpdates() as wrapper around it.

#Comment by IAlex (talk | contribs)   15:27, 30 June 2011

Done in r91180.

Status & tagging log