r103945 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r103944‎ | r103945 | r103946 >
Date:19:33, 22 November 2011
Author:brion
Status:resolved (Comments)
Tags:
Comment:
* (bug 32376) XML export tweak to use canonical user namespaces instead of gendered ones for now

The gendered namespaces aren't listed in the <siteinfo> namespaces list, and just adding them as-is may break reader implementations.
Using the canonical form should have little effect on most activity while keeping things working.

Should resolve Lucene search problems (bug 31697) related to the Lucene search indexer not understanding the gendered namespace names.
After fix and reindex, the canonical forms should get returned and automatically transformed back into appropriate gendered forms in the search view.

[per IRC discussion check-up with Ariel about this bug and the patches on bug 30513 which cover related namespacey issues]
Modified paths:
  • /trunk/phase3/includes/Export.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/Export.php
@@ -476,14 +476,14 @@
477477 function openPage( $row ) {
478478 $out = " <page>\n";
479479 $title = Title::makeTitle( $row->page_namespace, $row->page_title );
480 - $out .= ' ' . Xml::elementClean( 'title', array(), $title->getPrefixedText() ) . "\n";
 480+ $out .= ' ' . Xml::elementClean( 'title', array(), self::canonicalTitle( $title ) ) . "\n";
481481 $out .= ' ' . Xml::element( 'ns', array(), strval( $row->page_namespace) ) . "\n";
482482 $out .= ' ' . Xml::element( 'id', array(), strval( $row->page_id ) ) . "\n";
483483 if ( $row->page_is_redirect ) {
484484 $page = WikiPage::factory( $title );
485485 $redirect = $page->getRedirectTarget();
486486 if ( $redirect instanceOf Title && $redirect->isValidRedirectTarget() ) {
487 - $out .= ' ' . Xml::element( 'redirect', array( 'title' => $redirect->getPrefixedText() ) ) . "\n";
 487+ $out .= ' ' . Xml::element( 'redirect', array( 'title' => self::canonicalTitle( $redirect ) ) ) . "\n";
488488 }
489489 }
490490 if ( $row->page_restrictions != '' ) {
@@ -595,7 +595,7 @@
596596 $out .= " " . Xml::element( 'text', array( 'deleted' => 'deleted' ) ) . "\n";
597597 } else {
598598 $title = Title::makeTitle( $row->log_namespace, $row->log_title );
599 - $out .= " " . Xml::elementClean( 'logtitle', null, $title->getPrefixedText() ) . "\n";
 599+ $out .= " " . Xml::elementClean( 'logtitle', null, self::canonicalTitle( $title ) ) . "\n";
600600 $out .= " " . Xml::elementClean( 'params',
601601 array( 'xml:space' => 'preserve' ),
602602 strval( $row->log_params ) ) . "\n";
@@ -677,6 +677,29 @@
678678 " </upload>\n";
679679 }
680680
 681+ /**
 682+ * Return prefixed text form of title, but using the content language's
 683+ * canonical namespace. This skips any special-casing such as gendered
 684+ * user namespaces -- which while useful, are not yet listed in the
 685+ * XML <siteinfo> data so are unsafe in export.
 686+ *
 687+ * @param Title $title
 688+ * @return string
 689+ */
 690+ public static function canonicalTitle( Title $title ) {
 691+ if ( $title->getInterwiki() ) {
 692+ return $title->getPrefixedText();
 693+ }
 694+
 695+ global $wgContLang;
 696+ $prefix = $wgContLang->getNsText( $title->getNamespace() );
 697+
 698+ if ($prefix !== '') {
 699+ $prefix .= ':';
 700+ }
 701+
 702+ return $prefix . $title->getText();
 703+ }
681704 }
682705
683706

Follow-up revisions

RevisionCommit summaryAuthorDate
r103953MFT r103945: use canonical user NS in export, workaround for search prefix bu...brion20:19, 22 November 2011
r103954MFT r103945: use canonical user NS in export, workaround for search prefix bu...brion20:20, 22 November 2011
r104124Followup r103945: fix regression in XML export namespace formatting for names...brion01:26, 24 November 2011
r104125MFT r104124 (fixes XML export regression in r103945 / r103953 on branch)brion01:29, 24 November 2011
r104126MFT r104124 (fixes regression in r103945; r103954 on branch)brion01:29, 24 November 2011
r107202Followup r103945 - @since and whitespacenikerabbit14:40, 24 December 2011
r108764This commit restores getPrefixedText() to the Title in the XML. This means i...diederik21:34, 12 January 2012
r109889Added the following three items:...diederik01:14, 24 January 2012

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r102575Commit to fix bug 30513.diederik21:15, 9 November 2011

Comments

#Comment by Brion VIBBER (talk | contribs)   19:50, 22 November 2011

Needs merge & deploy for bug 31697 search problems with gendered namespaces.

#Comment by Rainman (talk | contribs)   00:16, 24 November 2011

There is still a problem. This has been pushed to the live site, right?

Now the header says User talk, but the actual article title begin with User_talk (note the underscore). This now breaks the parsing on all wikis, not only on those with gendered namespaces.

Also reported here: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Weird_search_results

#Comment by Brion VIBBER (talk | contribs)   01:20, 24 November 2011

D'oh! I see it.

#Comment by Brion VIBBER (talk | contribs)   01:22, 24 November 2011

Definitely one of those times I want to go back 10 years with a cluebat and knock around Young Magnus, Young(ish) Lee, and Young Brion and KILL THE UNDERSCORES. :)

#Comment by Brion VIBBER (talk | contribs)   01:27, 24 November 2011

r104124 should now fix this.

#Comment by Rainman (talk | contribs)   11:50, 24 November 2011

Looks good now, thanks!

Status & tagging log