r94502 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r94501‎ | r94502 | r94503 >
Date:12:20, 15 August 2011
Author:catrope
Status:resolved (Comments)
Tags:
Comment:
(bug 30269) Strings like foobar//barfoo are linked to become foobar[//barfoo]

* Introduce a boolean parameter to wfUrlProtocols() which, if set to false, will cause '//' to be dropped from the returned regex so it doesn't match protocol-relative URLs
* Introduce wfUrlProtocolsWithoutProtRel() as a wrapper for wfUrlProtocols( false ). The latter should not be used directly because the former is much clearer
* Use this new function in Parser::doMagicLinks() to fix the original bug. Also use it in ApiFormatBase::formatHTML() and CodeCommentLinker::link(), which probably had similar bugs
Modified paths:
  • /trunk/extensions/CodeReview/backend/CodeCommentLinker.php (modified) (history)
  • /trunk/phase3/includes/GlobalFunctions.php (modified) (history)
  • /trunk/phase3/includes/api/ApiFormatBase.php (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/GlobalFunctions.php
@@ -469,14 +469,17 @@
470470 /**
471471 * Returns a regular expression of url protocols
472472 *
 473+ * @param $includeProtocolRelative bool If false, remove '//' from the returned protocol list.
 474+ * DO NOT USE this directy, use wfUrlProtocolsWithoutProtRel() instead
473475 * @return String
474476 */
475 -function wfUrlProtocols() {
 477+function wfUrlProtocols( $includeProtocolRelative = true ) {
476478 global $wgUrlProtocols;
477479
478 - static $retval = null;
479 - if ( !is_null( $retval ) ) {
480 - return $retval;
 480+ // Cache return values separately based on $includeProtocolRelative
 481+ static $retval = array( true => null, false => null );
 482+ if ( !is_null( $retval[$includeProtocolRelative] ) ) {
 483+ return $retval[$includeProtocolRelative];
481484 }
482485
483486 // Support old-style $wgUrlProtocols strings, for backwards compatibility
@@ -484,17 +487,33 @@
485488 if ( is_array( $wgUrlProtocols ) ) {
486489 $protocols = array();
487490 foreach ( $wgUrlProtocols as $protocol ) {
488 - $protocols[] = preg_quote( $protocol, '/' );
 491+ // Filter out '//' if !$includeProtocolRelative
 492+ if ( $includeProtocolRelative || $protocol !== '//' ) {
 493+ $protocols[] = preg_quote( $protocol, '/' );
 494+ }
489495 }
490496
491 - $retval = implode( '|', $protocols );
 497+ $retval[$includeProtocolRelative] = implode( '|', $protocols );
492498 } else {
493 - $retval = $wgUrlProtocols;
 499+ // Ignore $includeProtocolRelative in this case
 500+ // This case exists for pre-1.6 compatibility, and we can safely assume
 501+ // that '//' won't appear in a pre-1.6 config because protocol-relative
 502+ // URLs weren't supported until 1.18
 503+ $retval[$includeProtocolRelative] = $wgUrlProtocols;
494504 }
495 - return $retval;
 505+ return $retval[$includeProtocolRelative];
496506 }
497507
498508 /**
 509+ * Like wfUrlProtocols(), but excludes '//' from the protocol list. Use this if
 510+ * you need a regex that matches all URL protocols but does not match protocol-
 511+ * relative URLs
 512+ */
 513+function wfUrlProtocolsWithoutProtRel() {
 514+ return wfUrlProtocols( false );
 515+}
 516+
 517+/**
499518 * parse_url() work-alike, but non-broken. Differences:
500519 *
501520 * 1) Does not raise warnings on bad URLs (just returns false)
Index: trunk/phase3/includes/parser/Parser.php
@@ -1230,7 +1230,7 @@
12311231 */
12321232 function doMagicLinks( $text ) {
12331233 wfProfileIn( __METHOD__ );
1234 - $prots = $this->mUrlProtocols;
 1234+ $prots = wfUrlProtocolsWithoutProtRel();
12351235 $urlChar = self::EXT_LINK_URL_CLASS;
12361236 $text = preg_replace_callback(
12371237 '!(?: # Start cases
Index: trunk/phase3/includes/api/ApiFormatBase.php
@@ -263,7 +263,7 @@
264264 // encode all comments or tags as safe blue strings
265265 $text = preg_replace( '/\&lt;(!--.*?--|.*?)\&gt;/', '<span style="color:blue;">&lt;\1&gt;</span>', $text );
266266 // identify URLs
267 - $protos = wfUrlProtocols();
 267+ $protos = wfUrlProtocolsWithoutProtRel();
268268 // This regex hacks around bug 13218 (&quot; included in the URL)
269269 $text = preg_replace( "#(($protos).*?)(&quot;)?([ \\'\"<>\n]|&lt;|&gt;|&quot;)#", '<a href="\\1">\\1</a>\\3\\4', $text );
270270 // identify requests to api.php
Index: trunk/extensions/CodeReview/backend/CodeCommentLinker.php
@@ -28,7 +28,7 @@
2929 function link( $text ) {
3030 # Catch links like http://www.mediawiki.org/wiki/Special:Code/MediaWiki/44245#c829
3131 # Ended by space or brackets (like those pesky <br /> tags)
32 - $text = preg_replace_callback( '/(^|[^\w[])(' . wfUrlProtocols() . ')(' . Parser::EXT_LINK_URL_CLASS . '+)/',
 32+ $text = preg_replace_callback( '/(^|[^\w[])(' . wfUrlProtocolsWithoutProtRel() . ')(' . Parser::EXT_LINK_URL_CLASS . '+)/',
3333 array( $this, 'generalLink' ), $text );
3434 $text = preg_replace_callback( '/\br(\d+)\b/',
3535 array( $this, 'messageRevLink' ), $text );

Follow-up revisions

RevisionCommit summaryAuthorDate
r94504Add parser test for r94502catrope12:25, 15 August 2011
r94511Followup r94502: per CR, use two caching variables instead of an array indexe...catrope13:16, 15 August 2011
r945621.17wmf1: MFT r93711, r94346, r94369, r94376, r94404, r94502, r94509, r94511catrope20:30, 15 August 2011
r964701.17wmf1: MFT r92962, r93062, r93093, r93385, r93468, r93473, r94350, r94502,...catrope19:07, 7 September 2011

Comments

#Comment by Tim Starling (talk | contribs)   12:52, 15 August 2011
static $retval = array( true => null, false => null );

I think you're better off avoiding trying to index arrays with booleans. If someone sees you doing it, they might think it actually works as written, and they'll get into trouble.

Arrays in PHP have a string index. What's happening here is that the boolean values are being converted to integers and then to strings:

> $a = array(true => '', false => '' );

> var_dump($a)
array(2) {
  [1]=>
  string(0) ""
  [0]=>
  string(0) ""
}

> unset($a[1]);

> var_dump($a)
array(1) {
  [0]=>
  string(0) ""
}

Otherwise OK.

#Comment by Catrope (talk | contribs)   13:11, 15 August 2011

Good point. I'll just use two caching vars. I was aware that the actual indices wouldn't be booleans, but hadn't really thought about the implications when mixing with non-boolean indices.

Status & tagging log