r65104 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r65103‎ | r65104 | r65105 >
Date:01:40, 16 April 2010
Author:brion
Status:resolved (Comments)
Tags:schema 
Comment:
* (bug 14473) Add iwlinks table to track inline interwiki link usage

Like langlinks, this stores the interwiki prefix (as iwl_prefix) and full page title (as iwl_title), attached to the page doing the liking (as iwl_from -> page_id).
Unlike langlinks, there can be multiple entries stored per interwiki prefix.

Updater to add the table confirmed on MySQL, untested on SQLite but should work.
Someone may still need to add and test a PostgreSQL updater.

Refactored makeWhereFrom2d() out of LinkBatch to Database so it could be re-used for the similar mapping for the interwiki links, which need a string prefix rather than an int namespace key.
Also cleaned it up internally to reuse existing code for building where clauses from arrays. (Tim & Domas -- if the previous more verbose code was there to reduce function call and array processing overhead on very large link lists, feel free to unroll it again if the difference is measurable. Just swap the var names around from the old LinkBatch code and escape the base key value if it's not an integer, it'll be functionally equivalent.)
Modified paths:
  • /trunk/phase3/includes/LinkBatch.php (modified) (history)
  • /trunk/phase3/includes/LinksUpdate.php (modified) (history)
  • /trunk/phase3/includes/db/Database.php (modified) (history)
  • /trunk/phase3/includes/parser/LinkHolderArray.php (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)
  • /trunk/phase3/includes/parser/ParserOutput.php (modified) (history)
  • /trunk/phase3/maintenance/archives/patch-iwlinks.sql (added) (history)
  • /trunk/phase3/maintenance/parserTests.inc (modified) (history)
  • /trunk/phase3/maintenance/tables.sql (modified) (history)
  • /trunk/phase3/maintenance/updaters.inc (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/archives/patch-iwlinks.sql
@@ -0,0 +1,16 @@
 2+--
 3+-- Track inline interwiki links
 4+--
 5+CREATE TABLE /*_*/iwlinks (
 6+ -- page_id of the referring page
 7+ iwl_from int unsigned NOT NULL default 0,
 8+
 9+ -- Interwiki prefix code of the target
 10+ iwl_prefix varbinary(20) NOT NULL default '',
 11+
 12+ -- Title of the target, including namespace
 13+ iwl_title varchar(255) binary NOT NULL default ''
 14+) /*$wgDBTableOptions*/;
 15+
 16+CREATE UNIQUE INDEX /*i*/iwl_from ON /*_*/iwlinks (iwl_from, iwl_prefix, iwl_title);
 17+CREATE INDEX /*i*/iwl_prefix ON /*_*/iwlinks (iwl_prefix, iwl_title);
Property changes on: trunk/phase3/maintenance/archives/patch-iwlinks.sql
___________________________________________________________________
Added: svn:eol-style
118 + native
Index: trunk/phase3/maintenance/parserTests.inc
@@ -602,7 +602,7 @@
603603 global $wgDBtype;
604604 $tables = array('user', 'page', 'page_restrictions',
605605 'protected_titles', 'revision', 'text', 'pagelinks', 'imagelinks',
606 - 'categorylinks', 'templatelinks', 'externallinks', 'langlinks',
 606+ 'categorylinks', 'templatelinks', 'externallinks', 'langlinks', 'iwlinks',
607607 'site_stats', 'hitcounter', 'ipblocks', 'image', 'oldimage',
608608 'recentchanges', 'watchlist', 'math', 'interwiki',
609609 'querycache', 'objectcache', 'job', 'l10n_cache', 'redirect', 'querycachetwo',
Index: trunk/phase3/maintenance/updaters.inc
@@ -172,6 +172,9 @@
173173 array( 'do_update_mime_minor_field' ),
174174 // Should've done this back in 1.10, but better late than never:
175175 array( 'do_populate_rev_len' ),
 176+
 177+ // 1.17
 178+ array( 'add_table', 'iwlinks', 'patch-iwlinks.sql' ),
176179 ),
177180
178181 'sqlite' => array(
@@ -199,6 +202,9 @@
200203 array( 'do_update_transcache_field' ),
201204 // version-independent searchindex setup, added in 1.16
202205 array( 'sqlite_setup_searchindex' ),
 206+
 207+ // 1.17
 208+ array( 'add_table', 'iwlinks', 'patch-iwlinks.sql' ), // @fixme so far untested on sqlite 2010-04-16
203209 ),
204210 );
205211
@@ -1590,6 +1596,7 @@
15911597 array('user_properties', 'patch-user_properties.sql'),
15921598 array('log_search', 'patch-log_search.sql'),
15931599 array('l10n_cache', 'patch-l10n_cache.sql'),
 1600+ // @fixme add iwlinks table
15941601 );
15951602
15961603 $newcols = array(
Index: trunk/phase3/maintenance/tables.sql
@@ -607,7 +607,25 @@
608608 CREATE INDEX /*i*/ll_lang ON /*_*/langlinks (ll_lang, ll_title);
609609
610610
 611+--
 612+-- Track inline interwiki links
611613 --
 614+CREATE TABLE /*_*/iwlinks (
 615+ -- page_id of the referring page
 616+ iwl_from int unsigned NOT NULL default 0,
 617+
 618+ -- Interwiki prefix code of the target
 619+ iwl_prefix varbinary(20) NOT NULL default '',
 620+
 621+ -- Title of the target, including namespace
 622+ iwl_title varchar(255) binary NOT NULL default ''
 623+) /*$wgDBTableOptions*/;
 624+
 625+CREATE UNIQUE INDEX /*i*/iwl_from ON /*_*/iwlinks (iwl_from, iwl_prefix, iwl_title);
 626+CREATE INDEX /*i*/iwl_prefix ON /*_*/iwlinks (iwl_prefix, iwl_title);
 627+
 628+
 629+--
612630 -- Contains a single row with some aggregate info
613631 -- on the state of the site.
614632 --
Index: trunk/phase3/includes/LinkBatch.php
@@ -144,48 +144,9 @@
145145 *
146146 * @param $prefix String: the appropriate table's field name prefix ('page', 'pl', etc)
147147 * @param $db DatabaseBase object to use
148 - * @return String
 148+ * @return mixed string with SQL where clause fragment, or false if no items.
149149 */
150 - public function constructSet( $prefix, &$db ) {
151 - $first = true;
152 - $firstTitle = true;
153 - $sql = '';
154 - foreach ( $this->data as $ns => $dbkeys ) {
155 - if ( !count( $dbkeys ) ) {
156 - continue;
157 - }
158 -
159 - if ( $first ) {
160 - $first = false;
161 - } else {
162 - $sql .= ' OR ';
163 - }
164 -
165 - if (count($dbkeys)==1) { // avoid multiple-reference syntax if simple equality can be used
166 - $singleKey = array_keys($dbkeys);
167 - $sql .= "({$prefix}_namespace=$ns AND {$prefix}_title=".
168 - $db->addQuotes($singleKey[0]).
169 - ")";
170 - } else {
171 - $sql .= "({$prefix}_namespace=$ns AND {$prefix}_title IN (";
172 -
173 - $firstTitle = true;
174 - foreach( $dbkeys as $dbkey => $unused ) {
175 - if ( $firstTitle ) {
176 - $firstTitle = false;
177 - } else {
178 - $sql .= ',';
179 - }
180 - $sql .= $db->addQuotes( $dbkey );
181 - }
182 - $sql .= '))';
183 - }
184 - }
185 - if ( $first && $firstTitle ) {
186 - # No titles added
187 - return false;
188 - } else {
189 - return $sql;
190 - }
 150+ public function constructSet( $prefix, $db ) {
 151+ return $db->makeWhereFrom2d( $this->data, "{$prefix}_namespace", "{$prefix}_title" );
191152 }
192153 }
Index: trunk/phase3/includes/parser/LinkHolderArray.php
@@ -162,6 +162,9 @@
163163 # Check if it's a static known link, e.g. interwiki
164164 if ( $title->isAlwaysKnown() ) {
165165 $colours[$pdbk] = '';
 166+ if( $title->getInterwiki() != '' ) {
 167+ $output->addInterwikiLink( $title );
 168+ }
166169 } elseif ( ( $id = $linkCache->getGoodLinkID( $pdbk ) ) != 0 ) {
167170 $colours[$pdbk] = $sk->getLinkColour( $title, $threshold );
168171 $output->addLink( $title, $id );
Index: trunk/phase3/includes/parser/Parser.php
@@ -1830,7 +1830,7 @@
18311831 #
18321832 # FIXME: isAlwaysKnown() can be expensive for file links; we should really do
18331833 # batch file existence checks for NS_FILE and NS_MEDIA
1834 - if ( $iw == '' && $nt->isAlwaysKnown() ) {
 1834+ if ( $iw = '' && $nt->isAlwaysKnown() ) {
18351835 $this->mOutput->addLink( $nt );
18361836 $s .= $this->makeKnownLinkHolder( $nt, $text, '', $trail, $prefix );
18371837 } else {
Index: trunk/phase3/includes/parser/ParserOutput.php
@@ -17,6 +17,7 @@
1818 $mTemplateIds = array(), # 2-D map of NS/DBK to rev ID for the template references. ID=zero for broken.
1919 $mImages = array(), # DB keys of the images used, in the array key only
2020 $mExternalLinks = array(), # External link URLs, in the key only
 21+ $mInterwikiLinks = array(), # 2-D map of prefix/DBK (in keys only) for the inline interwiki links in the document.
2122 $mNewSection = false, # Show a new section link?
2223 $mHideNewSection = false, # Hide the new section link?
2324 $mNoGallery = false, # No gallery on category page? (__NOGALLERY__)
@@ -40,6 +41,7 @@
4142
4243 function getText() { return $this->mText; }
4344 function &getLanguageLinks() { return $this->mLanguageLinks; }
 45+ function getInterwikiLinks() { return $this->mInterwikiLinks; }
4446 function getCategoryLinks() { return array_keys( $this->mCategories ); }
4547 function &getCategories() { return $this->mCategories; }
4648 function getCacheTime() { return $this->mCacheTime; }
@@ -96,9 +98,17 @@
9799 $this->mExternalLinks[$url] = 1;
98100 }
99101
 102+ /**
 103+ * Record a local or interwiki inline link for saving in future link tables.
 104+ *
 105+ * @param Title $title
 106+ * @param mixed $id optional known page_id so we can skip the lookup
 107+ */
100108 function addLink( $title, $id = null ) {
 109+ wfDebug(__METHOD__ . " got: " . $title->getPrefixedText() . "\n");
101110 if ( $title->isExternal() ) {
102111 // Don't record interwikis in pagelinks
 112+ $this->addInterwikiLink( $title );
103113 return;
104114 }
105115 $ns = $title->getNamespace();
@@ -139,6 +149,21 @@
140150 }
141151 $this->mTemplateIds[$ns][$dbk] = $rev_id; // For versioning
142152 }
 153+
 154+ /**
 155+ * @param Title $title object, must be an interwiki link
 156+ * @throws MWException if given invalid input
 157+ */
 158+ function addInterwikiLink( $title ) {
 159+ $prefix = $title->getInterwiki();
 160+ if( $prefix == '' ) {
 161+ throw new MWException( 'Non-interwiki link passed, internal parser error.' );
 162+ }
 163+ if (!isset($this->mInterwikiLinks[$prefix])) {
 164+ $this->mInterwikiLinks[$prefix] = array();
 165+ }
 166+ $this->mInterwikiLinks[$prefix][$title->getDBkey()] = 1;
 167+ }
143168
144169 /**
145170 * Return true if this cached output object predates the global or
Index: trunk/phase3/includes/db/Database.php
@@ -1274,6 +1274,33 @@
12751275 }
12761276
12771277 /**
 1278+ * Build a partial where clause from a 2-d array such as used for LinkBatch.
 1279+ * The keys on each level may be either integers or strings.
 1280+ *
 1281+ * @param array $data organized as 2-d array(baseKeyVal => array(subKeyVal => <ignored>, ...), ...)
 1282+ * @param string $baseKey field name to match the base-level keys to (eg 'pl_namespace')
 1283+ * @param string $subKey field name to match the sub-level keys to (eg 'pl_title')
 1284+ * @return mixed string SQL fragment, or false if no items in array.
 1285+ */
 1286+ function makeWhereFrom2d( $data, $baseKey, $subKey ) {
 1287+ $conds = array();
 1288+ foreach ( $data as $base => $sub ) {
 1289+ if ( count( $sub ) ) {
 1290+ $conds[] = $this->makeList(
 1291+ array( $baseKey => $base, $subKey => array_keys( $sub ) ),
 1292+ LIST_AND);
 1293+ }
 1294+ }
 1295+
 1296+ if ( $conds ) {
 1297+ return $this->makeList( $conds, LIST_OR );
 1298+ } else {
 1299+ // Nothing to search for...
 1300+ return false;
 1301+ }
 1302+ }
 1303+
 1304+ /**
12781305 * Bitwise operations
12791306 */
12801307
Index: trunk/phase3/includes/LinksUpdate.php
@@ -54,6 +54,7 @@
5555 $this->mExternals = $parserOutput->getExternalLinks();
5656 $this->mCategories = $parserOutput->getCategories();
5757 $this->mProperties = $parserOutput->getProperties();
 58+ $this->mInterwikis = $parserOutput->getInterwikiLinks();
5859
5960 # Convert the format of the interlanguage links
6061 # I didn't want to change it in the ParserOutput, because that array is passed all
@@ -115,6 +116,11 @@
116117 $this->incrTableUpdate( 'langlinks', 'll', $this->getInterlangDeletions( $existing ),
117118 $this->getInterlangInsertions( $existing ) );
118119
 120+ # Inline interwiki links
 121+ $existing = $this->getExistingInterwikis();
 122+ $this->incrTableUpdate( 'iwlinks', 'iwl', $this->getInterwikiDeletions( $existing ),
 123+ $this->getInterwikiInsertions( $existing ) );
 124+
119125 # Template links
120126 $existing = $this->getExistingTemplates();
121127 $this->incrTableUpdate( 'templatelinks', 'tl', $this->getTemplateDeletions( $existing ),
@@ -175,6 +181,7 @@
176182 $this->dumbTableUpdate( 'templatelinks', $this->getTemplateInsertions(), 'tl_from' );
177183 $this->dumbTableUpdate( 'externallinks', $this->getExternalInsertions(), 'el_from' );
178184 $this->dumbTableUpdate( 'langlinks', $this->getInterlangInsertions(),'ll_from' );
 185+ $this->dumbTableUpdate( 'iwlinks', $this->getInterwikiInsertions(),'iwl_from' );
179186 $this->dumbTableUpdate( 'page_props', $this->getPropertyInsertions(), 'pp_page' );
180187
181188 # Update the cache of all the category pages and image description
@@ -292,18 +299,6 @@
293300 }
294301
295302 /**
296 - * Make a WHERE clause from a 2-d NS/dbkey array
297 - *
298 - * @param array $arr 2-d array indexed by namespace and DB key
299 - * @param string $prefix Field name prefix, without the underscore
300 - */
301 - function makeWhereFrom2d( &$arr, $prefix ) {
302 - $lb = new LinkBatch;
303 - $lb->setArray( $arr );
304 - return $lb->constructSet( $prefix, $this->mDb );
305 - }
306 -
307 - /**
308303 * Update a table by doing a delete query then an insert query
309304 * @private
310305 */
@@ -314,8 +309,13 @@
315310 $fromField = "{$prefix}_from";
316311 }
317312 $where = array( $fromField => $this->mId );
318 - if ( $table == 'pagelinks' || $table == 'templatelinks' ) {
319 - $clause = $this->makeWhereFrom2d( $deletions, $prefix );
 313+ if ( $table == 'pagelinks' || $table == 'templatelinks' || $table == 'iwlinks' ) {
 314+ if ( $table == 'iwlinks' ) {
 315+ $baseKey = 'iwl_prefix';
 316+ } else {
 317+ $baseKey = "{$prefix}_namespace";
 318+ }
 319+ $clause = $this->mDb->makeWhereFrom2d( $deletions, $baseKey, "{$prefix}_title" );
320320 if ( $clause ) {
321321 $where[] = $clause;
322322 } else {
@@ -476,7 +476,30 @@
477477 return $arr;
478478 }
479479
 480+ /**
 481+ * Get an array of interwiki insertions for passing to the DB
 482+ * Skips the titles specified by the 2-D array $existing
 483+ * @private
 484+ */
 485+ function getInterwikiInsertions( $existing = array() ) {
 486+ $arr = array();
 487+ foreach( $this->mInterwikis as $prefix => $dbkeys ) {
 488+ # array_diff_key() was introduced in PHP 5.1, there is a compatibility function
 489+ # in GlobalFunctions.php
 490+ $diffs = isset( $existing[$prefix] ) ? array_diff_key( $dbkeys, $existing[$prefix] ) : $dbkeys;
 491+ foreach ( $diffs as $dbk => $id ) {
 492+ $arr[] = array(
 493+ 'iwl_from' => $this->mId,
 494+ 'iwl_prefix' => $prefix,
 495+ 'iwl_title' => $dbk
 496+ );
 497+ }
 498+ }
 499+ return $arr;
 500+ }
480501
 502+
 503+
481504 /**
482505 * Given an array of existing links, returns those links which are not in $this
483506 * and thus should be deleted.
@@ -556,6 +579,23 @@
557580 }
558581
559582 /**
 583+ * Given an array of existing interwiki links, returns those links which are not in $this
 584+ * and thus should be deleted.
 585+ * @private
 586+ */
 587+ function getInterwikiDeletions( $existing ) {
 588+ $del = array();
 589+ foreach ( $existing as $prefix => $dbkeys ) {
 590+ if ( isset( $this->mInterwikis[$prefix] ) ) {
 591+ $del[$prefix] = array_diff_key( $existing[$prefix], $this->mInterwikis[$prefix] );
 592+ } else {
 593+ $del[$prefix] = $existing[$prefix];
 594+ }
 595+ }
 596+ return $del;
 597+ }
 598+
 599+ /**
560600 * Get an array of existing links, as a 2-D array
561601 * @private
562602 */
@@ -652,6 +692,24 @@
653693 }
654694
655695 /**
 696+ * Get an array of existing inline interwiki links, as a 2-D array
 697+ * @return array (prefix => array(dbkey => 1))
 698+ */
 699+ protected function getExistingInterwikis() {
 700+ $res = $this->mDb->select( 'iwlinks', array( 'iwl_prefix', 'iwl_title' ),
 701+ array( 'iwl_from' => $this->mId ), __METHOD__, $this->mOptions );
 702+ $arr = array();
 703+ while ( $row = $this->mDb->fetchObject( $res ) ) {
 704+ if ( !isset( $arr[$row->iwl_prefix] ) ) {
 705+ $arr[$row->iwl_prefix] = array();
 706+ }
 707+ $arr[$row->iwl_prefix][$row->iwl_title] = 1;
 708+ }
 709+ $this->mDb->freeResult( $res );
 710+ return $arr;
 711+ }
 712+
 713+ /**
656714 * Get an array of existing categories, with the name in the key and sort key in the value.
657715 * @private
658716 */

Follow-up revisions

RevisionCommit summaryAuthorDate
r65105Followup to r65104, PG updater/table defoverlordq02:35, 16 April 2010
r65203Cleanup for bugs in r65104 (iwlinks table), that'll teach me not to commit at...brion00:39, 18 April 2010
r66440Part 1 of Bug 23524 - Api Modules as followup to bug 14473 (Add iwlinks table...reedy20:54, 14 May 2010
r66826rm fixme from r65104, everything works fine on SQLitemaxsem13:35, 24 May 2010
r66891* (bug 23524) Api Modules as followup to bug 14473 (Add iwlinks table to trac...reedy19:50, 25 May 2010

Comments

#Comment by OverlordQ (talk | contribs)   03:00, 16 April 2010

Am I missing something non-obvious but what's with this change?

- if ( $iw == && $nt->isAlwaysKnown() ) { + if ( $iw = && $nt->isAlwaysKnown() ) {

And it just doesn't seem to do anything, iwlinks doesn't get populated.

#Comment by MaxSem (talk | contribs)   06:27, 16 April 2010

The updater works on SQLite, the resulting table structure looks correct:

CREATE TABLE iwlinks (
 iwl_from INTEGER  NOT NULL default 0,
 iwl_prefix BLOB NOT NULL default ,
 iwl_title TEXT  NOT NULL default 
 )
#Comment by Catrope (talk | contribs)   12:11, 17 April 2010
-			if ( $iw == '' && $nt->isAlwaysKnown() ) {
+			if ( $iw = '' && $nt->isAlwaysKnown() ) {

As mentioned in the first comment, this is probably wrong and may very well be causing the reported bugginess of the table not being populated.

#Comment by Catrope (talk | contribs)   12:15, 17 April 2010
+		wfDebug(__METHOD__ . " got: " . $title->getPrefixedText() . "\n");

I don't think we want to keep this around, it'd produce a huge amount of clutter in the debug log. Also, there's some inconsistent coding style here and there.

#Comment by Brion VIBBER (talk | contribs)   00:39, 18 April 2010

Good catches:

  • debug line got left in
  • typo returning a line to its original state during editing
  • misplaced save code due to making changes after final testing

Fixed in r65203.

Status & tagging log