r82399 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r82398‎ | r82399 | r82400 >
Date:14:09, 18 February 2011
Author:tstarling
Status:ok
Tags:
Comment:
JavaScriptDistiller fixes:
* Removed some repeated subexpressions, /(.)*/ etc. These cause the stack space to be rapidly exhausted, leading to a segfault for malicious input. I replaced the ([\r\n]|.) subexpressions with a dot with a /s modifier. Added a pcre.recursion_limit hack to take care of the rest.
* Supported assertions and non-capturing subpatterns in ParseMaster::add() by recognising that parentheses that aren't followed by "?" are capturing. Used another assertion to do this.
* Fixed a bug whereby if a single-line comment had a slash in it, it was recognised as a regex instead of a comment. This occurred because the leading whitespace caused the regex regex to match at an earlier string position than the comment regex, giving it undue precedence. This was the subject of several reports on IRC and the main reason for me starting work on JavaScriptDistiller. Used a lookbehind assertion.
* Give comments precedence over regexes and strings where there is ambiguity at the same string location. Not sure if this does anything, but it seemed like a good idea at the time.
* Removed unused variable ParseMaster::$TRIM.
* To test it, I ran the old and new jquery.js output through Google Closure Compiler. The result was the same, proving that there are no functional differences.
Modified paths:
  • /trunk/phase3/includes/libs/JavaScriptDistiller.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/libs/JavaScriptDistiller.php
@@ -19,12 +19,20 @@
2020 * @param $stripVerticalSpace Boolean: Try to remove as much vertical whitespace as possible
2121 */
2222 public static function stripWhiteSpace( $script, $stripVerticalSpace = false ) {
 23+ // Try to avoid segfaulting
 24+ // I saw segfaults with a limit of 10000, 1000 seems to work
 25+ $oldLimit = ini_get( 'pcre.recursion_limit' );
 26+ if ( intval( $oldLimit ) > 1000 ) {
 27+ ini_set( 'pcre.recursion_limit', '1000' );
 28+ }
 29+
2330 $script = self::stripHorizontalSpace( $script );
2431 // If requested, make some vertical whitespace collapsing as well
2532 if ( $stripVerticalSpace ) {
2633 $script = self::stripVerticalSpace( $script );
2734 }
2835 // Done
 36+ ini_set( 'pcre.recursion_limit', $oldLimit );
2937 return $script;
3038 }
3139
@@ -74,55 +82,44 @@
7583 // to \s if we use a backslash as the escape character. We work around this by using an
7684 // obscure escape character that we hope will never appear at the end of a line.
7785 $parser->escapeChar = chr( 1 );
 86+
 87+ // C-style comment: use non-greedy repetition to find the end
 88+ $parser->add( '\/ \* .*? \* \/' );
 89+
 90+ // Preserve the newline after a C++-style comment -- bug 27046
 91+ $parser->add( '\/ \/ [^\r\n]* ( [\r\n] )', '$2' );
 92+
7893 // Protect strings. The original code had [^\'\\v] here, but that didn't armor multiline
7994 // strings correctly. This also armors multiline strings that don't have backslashes at the
8095 // end of the line (these are invalid), but that's fine because we're just armoring here.
8196
8297 // Single quotes
83 - $parser->add(
84 - '\' (' . // start quote
 98+ $parser->add(
 99+ '\'' . // start quote
85100 '[^\'\\\\]*' . // a run of non-special characters
86 - '(' .
87 - '\\\\ ( . | [\r\n] )' . // a backslash followed by a character or line ending
 101+ '(?:' .
 102+ '\\\\ .' . // a backslash followed by a character or line ending
88103 '[^\'\\\\]*' . // a run of non-special characters
89104 ')*' . // any number of the above
90 - ') \'', // end quote
 105+ '\'', // end quote
91106 '$1' );
92107
93108 // Double quotes: same as above
94 - $parser->add( '" ( [^"\\\\]* ( \\\\ ( . | [\r\n] ) [^"\\\\]* )* ) "', '$1' );
 109+ $parser->add( '" [^"\\\\]* (?: \\\\ . [^"\\\\]* )* "', '$1' );
95110
96111 // Protect regular expressions
97112 // Regular expression with whitespace before it
98113 $parser->add(
99 - '[ \t]+ ( ( \/' . // whitespace then start slash
 114+ '(?<= [ \t] | [^\w\$\/\'"*)\?:] )' . // assert that whitespace or punctuation precedes
 115+ '\/' . // start slash
100116 '[^\r\n\*]' . // not a comment-start or line ending
101117 '[^\/\r\n\\\\]*' . // a sequence of non-special characters
102 - '(' .
 118+ '(?:' .
103119 '\\\\.' . // an escaped dot
104120 '[^\/\r\n\\\\]*' . // a sequence of non-special characters
105121 ')*' . // any number of the above
106 - '\/(i|g)*' . // pattern end, optional modifier
107 - ') )',
 122+ '\/[ig]*' , // pattern end, optional modifier
108123 '$1' );
109 - // Regular expression with an operator before it
110 - $parser->add(
111 - '( [^\w\$\/\'"*)\?:] (\/' . // certain kinds of punctuation and then start slash
112 - '[^\r\n\*]' . // not a comment-start or line ending
113 - '[^\/\r\n\\\\]*' . // a sequence of non-special characters
114 - '(' .
115 - '\\\\.' . // an escaped dot
116 - '[^\/\r\n\\\\]*' . // a sequence of non-special characters
117 - ')*' . // any number of the above
118 - '\/(i|g)*)' . // pattern end, optional modifier
119 - ')',
120 - '$1' );
121 -
122 - // C-style comment: use non-greedy repetition to find the end
123 - $parser->add( '\/ \* ( . | [\r\n] )*? \* \/' );
124 -
125 - // Preserve the newline after a C++-style comment -- bug 27046
126 - $parser->add( '\/ \/ [^\r\n]* ( [\r\n] )', '$2' );
127124 return $parser;
128125 }
129126 }
@@ -149,10 +146,9 @@
150147 const LENGTH = 2;
151148
152149 // used to determine nesting levels
153 - private $GROUPS = '/\(/';//g
 150+ private $GROUPS = '/\( (?! \? ) /x';//g
154151 private $SUB_REPLACE = '/\$\d/';
155152 private $INDEXED = '/^\$\d+$/';
156 - //private $TRIM = '/([\'"])\1\.(.*)\.\1\1$/';
157153 private $ESCAPE = '/\\\./';//g
158154 private $QUOTE = '/\'/';
159155 private $DELETED = '/\x01[^\x01]*\x01/';//g
@@ -197,9 +193,9 @@
198194 // simulate the _patterns.toSTring of Dean
199195 $regexp = '/';
200196 foreach ($this->_patterns as $reg) {
201 - $regexp .= '(' . $reg[self::EXPRESSION] . ')|';
 197+ $regexp .= '(' . $reg[self::EXPRESSION] . ")|\n";
202198 }
203 - $regexp = substr($regexp, 0, -1) . '/Sx';
 199+ $regexp = substr($regexp, 0, -2) . '/Sxs';
204200 $regexp .= ($this->ignoreCase) ? 'i' : '';
205201
206202 $string = $this->_escape($string, $this->escapeChar);

Follow-up revisions

RevisionCommit summaryAuthorDate
r82401Remove comment that makes no sense post r82399catrope14:21, 18 February 2011
r824021.17wmf1: MFT r82380, r82384, r82399catrope14:28, 18 February 2011
r85290Per Roan, pull JavaScriptDistiller from trunk since it won't merge cleanly. T...demon22:22, 3 April 2011

Status & tagging log