r88883 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r88882‎ | r88883 | r88884 >
Date:09:49, 26 May 2011
Author:catrope
Status:resolved (Comments)
Tags:
Comment:
(bug 28840) URLs with dots break because of IE6 security check

* Replace the overly paranoid regex with a function that simulates IE6's behavior
* Remove the UA check in isPathInfoBad(), was causing more problems than it was worth
* Revert r87711, going back to using dots for dots in ResourceLoader URLs, instead of exclamation marks
* Append &* to ResourceLoader URLs. * is an illegal character in extensions, and putting it at the end of the URL ensures that both IE6 and our detection function will deem the URL to have no extension (unless something like .html? appears in the query string, but in that case we're screwed no matter what)
Modified paths:
  • /trunk/phase3/includes/OutputPage.php (modified) (history)
  • /trunk/phase3/includes/WebRequest.php (modified) (history)
  • /trunk/phase3/includes/resourceloader/ResourceLoader.php (modified) (history)
  • /trunk/phase3/includes/resourceloader/ResourceLoaderContext.php (modified) (history)
  • /trunk/phase3/resources/mediawiki/mediawiki.js (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/OutputPage.php
@@ -2522,6 +2522,9 @@
25232523 ksort( $query );
25242524
25252525 $url = wfAppendQuery( $wgLoadScript, $query );
 2526+ // Prevent the IE6 extension check from being triggered (bug 28840)
 2527+ // by appending a character that's invalid in Windows extensions ('*')
 2528+ $url .= '&*';
25262529 if ( $useESI && $wgResourceLoaderUseESI ) {
25272530 $esi = Xml::element( 'esi:include', array( 'src' => $url ) );
25282531 if ( $only == ResourceLoaderModule::TYPE_STYLES ) {
@@ -2532,9 +2535,9 @@
25332536 } else {
25342537 // Automatically select style/script elements
25352538 if ( $only === ResourceLoaderModule::TYPE_STYLES ) {
2536 - $link = Html::linkedStyle( wfAppendQuery( $wgLoadScript, $query ) );
 2539+ $link = Html::linkedStyle( $url );
25372540 } else {
2538 - $link = Html::linkedScript( wfAppendQuery( $wgLoadScript, $query ) );
 2541+ $link = Html::linkedScript( $url );
25392542 }
25402543 }
25412544
Index: trunk/phase3/includes/WebRequest.php
@@ -785,17 +785,10 @@
786786 public function isPathInfoBad() {
787787 global $wgScriptExtension;
788788
789 - if ( isset( $_SERVER['QUERY_STRING'] )
790 - && preg_match( '/\.[^\\/:*?"<>|%]+(#|\?|$)/i', $_SERVER['QUERY_STRING'] ) )
 789+ if ( isset( $_SERVER['QUERY_STRING'] ) )
791790 {
792791 // Bug 28235
793 - // Block only Internet Explorer, and requests with missing UA
794 - // headers that could be IE users behind a privacy proxy.
795 - if ( !isset( $_SERVER['HTTP_USER_AGENT'] )
796 - || preg_match( '/; *MSIE/', $_SERVER['HTTP_USER_AGENT'] ) )
797 - {
798 - return true;
799 - }
 792+ return strval( self::findIE6Extension( $_SERVER['QUERY_STRING'] ) ) !== '';
800793 }
801794
802795 if ( !isset( $_SERVER['PATH_INFO'] ) ) {
@@ -809,6 +802,69 @@
810803 $ext = substr( $pi, $dotPos );
811804 return !in_array( $ext, array( $wgScriptExtension, '.php', '.php5' ) );
812805 }
 806+
 807+ /**
 808+ * Determine what extension IE6 will infer from a certain query string.
 809+ * If the URL has an extension before the question mark, IE6 will use
 810+ * that and ignore the query string, but per the comment at
 811+ * isPathInfoBad() we don't have a reliable way to determine the URL,
 812+ * so isPathInfoBad() just passes in the query string for $url.
 813+ * All entry points have safe extensions (php, php5) anyway, so
 814+ * checking the query string is possibly overly paranoid but never
 815+ * insecure.
 816+ *
 817+ * The criteria for finding an extension are as follows:
 818+ * * a possible extension is a dot followed by one or more characters not in <>\"/:|?.#
 819+ * * if we find a possible extension followed by the end of the string or a #, that's our extension
 820+ * * if we find a possible extension followed by a ?, that's our extension
 821+ * ** UNLESS it's exe, dll or cgi, in which case we ignore it and continue searching for another possible extension
 822+ * * if we find a possible extension followed by a dot or another illegal character, we ignore it and continue searching
 823+ *
 824+ * @param $url string URL
 825+ * @return mixed Detected extension (string), or false if none found
 826+ */
 827+ public static function findIE6Extension( $url ) {
 828+ $remaining = $url;
 829+ while ( $remaining !== '' ) {
 830+ // Skip ahead to the next dot
 831+ $pos = strcspn( $remaining, '.' );
 832+ if ( $pos === strlen( $remaining ) || $remaining[$pos] === '#' ) {
 833+ // End of string, we're done
 834+ return false;
 835+ }
 836+
 837+ // We found a dot. Skip past it
 838+ $remaining = substr( $remaining, $pos + 1 );
 839+ // Check for illegal characters in our prospective extension,
 840+ // or for another dot, or for a hash sign
 841+ $pos = strcspn( $remaining, "<>\\\"/:|?*.#" );
 842+ if ( $pos === strlen( $remaining ) ) {
 843+ // No illegal character or next dot or hash
 844+ // We have our extension
 845+ return $remaining;
 846+ }
 847+ if ( $remaining[$pos] === '#' ) {
 848+ // We've found a legal extension followed by a hash
 849+ // Trim the hash and everything after it, then return that
 850+ return substr( $remaining, 0, $pos );
 851+ }
 852+ if ( $remaining[$pos] === '?' ) {
 853+ // We've found a legal extension followed by a question mark
 854+ // If the extension is NOT exe, dll or cgi, return it
 855+ $extension = substr( $remaining, 0, $pos );
 856+ if ( strcasecmp( $extension, 'exe' ) && strcasecmp( $extension, 'dll' ) &&
 857+ strcasecmp( $extension, 'cgi' ) )
 858+ {
 859+ return $extension;
 860+ }
 861+ // Else continue looking
 862+ }
 863+ // We found an illegal character or another dot
 864+ // Skip to that character and continue the loop
 865+ $remaining = substr( $remaining, $pos );
 866+ }
 867+ return false;
 868+ }
813869
814870 /**
815871 * Parse the Accept-Language header sent by the client into an array
Index: trunk/phase3/includes/resourceloader/ResourceLoader.php
@@ -736,8 +736,7 @@
737737 * Convert an array of module names to a packed query string.
738738 *
739739 * For example, array( 'foo.bar', 'foo.baz', 'bar.baz', 'bar.quux' )
740 - * becomes 'foo!bar,baz|bar!baz,quux'
741 - * The ! is for IE6 being stupid with extensions.
 740+ * becomes 'foo.bar,baz|bar.baz,quux'
742741 * @param $modules array of module names (strings)
743742 * @return string Packed query string
744743 */
@@ -756,7 +755,7 @@
757756 $arr[] = $p . implode( ',', $suffixes );
758757 }
759758 $str = implode( '|', $arr );
760 - return str_replace( ".", "!", $str ); # bug 28840
 759+ return $str;
761760 }
762761
763762 /**
Index: trunk/phase3/includes/resourceloader/ResourceLoaderContext.php
@@ -67,13 +67,12 @@
6868 /**
6969 * Expand a string of the form jquery.foo,bar|jquery.ui.baz,quux to
7070 * an array of module names like array( 'jquery.foo', 'jquery.bar',
71 - * 'jquery.ui.baz', 'jquery.ui.quux' ) Also translating ! to .
 71+ * 'jquery.ui.baz', 'jquery.ui.quux' )
7272 * @param $modules String Packed module name list
7373 * @return array of module names
7474 */
7575 public static function expandModuleNames( $modules ) {
7676 $retval = array();
77 - $modules = str_replace( "!", ".", $modules ); # bug 28840 - IE is stupid.
7877 $exploded = explode( '|', $modules );
7978 foreach ( $exploded as $group ) {
8079 if ( strpos( $group, ',' ) === false ) {
Index: trunk/phase3/resources/mediawiki/mediawiki.js
@@ -626,7 +626,7 @@
627627 var p = prefix === '' ? '' : prefix + '.';
628628 arr.push( p + moduleMap[prefix].join( ',' ) );
629629 }
630 - return arr.join( '|' ).replace( /\./g, '!' );
 630+ return arr.join( '|' );
631631 }
632632
633633 /**
@@ -780,7 +780,8 @@
781781 // Asynchronously append a script tag to the end of the body
782782 for ( var r = 0; r < requests.length; r++ ) {
783783 requests[r] = sortQuery( requests[r] );
784 - var src = mw.config.get( 'wgLoadScript' ) + '?' + $.param( requests[r] );
 784+ // Append &* to avoid triggering the IE6 extension check
 785+ var src = mw.config.get( 'wgLoadScript' ) + '?' + $.param( requests[r] ) + '&*';
785786 addScript( src );
786787 }
787788 };

Follow-up revisions

RevisionCommit summaryAuthorDate
r89197Tests for r88883, including two failing tests.tstarling02:09, 31 May 2011
r89245Fixes for r88883, r89197:...tstarling00:51, 1 June 2011
r89248* Only blacklist query string extensions which match /^[a-zA-Z0-9_-]+$/. This...tstarling02:01, 1 June 2011
r89397(bug 28840) If the query string hits bug 28235, redirect to a safer URL inste...tstarling05:32, 3 June 2011
r89627MFT r89397, r89558, etc.: bug 28840 IE URL extensiontstarling05:57, 7 June 2011
r89628Merge r89627 from 1.18, equivalent to trunk r89558, r89397, etc.: bug 28840 I...tstarling07:00, 7 June 2011
r91440* Merged r89628 from 1.17: bug 28840 IE URL extension fixes....tstarling06:35, 5 July 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r87711(bug 28840) Commit patch by bawolff that encodes dots in ResourceLoader modul...catrope13:10, 9 May 2011

Comments

#Comment by Krinkle (talk | contribs)   10:11, 26 May 2011

... (unless something like .html? appears in the query string, but in that case we're screwed no matter what) ...

Does that mean that a module named "mw.foo.html" must not exist ? I'm not sure what url variations are possible, but are there potential situations in which a question mark could end up after a module name in a load.php requests ?

I dont think so, but just repeating here in case someone else thinks differently.

#Comment by Catrope (talk | contribs)   19:23, 26 May 2011

No, a question mark can't end up in the middle of an RL query string, except if a question mark appears in a module name. But we could (and probably should) just disallow that.

#Comment by Tim Starling (talk | contribs)   02:27, 31 May 2011

It doesn't match IE when the URL has a hash character in it. An extension after a hash is meant to be ignored. I committed failing test cases in r89197.

Also, scanning a string by progressively taking right hand substrings is O(N^2). The reason is that substr() requires copying the substring, which is O(N). In the worst case, you copy N(N+1)/2 characters, where N is the string length.

> $t = microtime(true); WebRequest::findIE6Extension(str_repeat('.',100000)); print microtime(true)-$t;
2.7449851036072
> $t = microtime(true); WebRequest::findIE6Extension(str_repeat('.',200000)); print microtime(true)-$t;
9.5464141368866
> $t = microtime(true); WebRequest::findIE6Extension(str_repeat('.',400000)); print microtime(true)-$t;
35.981049060822

I'm not sure if you can have a URL that's 100K characters, but even if you can't, this style of string processing is definitely a bad habit to get into. God gave us offset parameters for a reason.

#Comment by Tim Starling (talk | contribs)   00:52, 1 June 2011

Fixed in r89245. But the overblocking issues noted on bug 28840 probably still remain.

#Comment by Catrope (talk | contribs)   13:49, 1 June 2011

Overblocking issues mostly addressed by Tim in r89249.

Thanks for fixing these issues yourself so quickly.

Status & tagging log