r73196 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r73195‎ | r73196 | r73197 >
Date:07:59, 17 September 2010
Author:tstarling
Status:resolved (Comments)
Tags:
Comment:
In JSMin:
* Don't delete line breaks, it's annoying. Replace multi-line comments with the same number of line breaks, to keep line numbers the same. Tested line break count for jquery.js, it's the same now with or without minification.
* Remove the line break from the start of the output, which is there due to $this->a being initialised to "\n".
* Simplified get() by moving the character translations to __construct(). This enables some useful optimisations.
* Straightened out lookAhead/peek logic, which was poorly ported from C, where it was simulated using getc(stdin).
* Optimised scanning for end of comment, using strcspn() and strpos(). Now does multi-line comments at 14700 KB/s instead of 200 KB/s. Same trick with string and regex literals.
* For clarity, don't use $this->a as a temporary variable in the DELETE_A branch of action(). The only way the loop can exit is by it being the same as $this->b, which it was to start with anyway, so there's no point changing it in every iteration.
* Inlined get() and peek() into next() for a small performance gain. Removed peek() since it is no longer used.
* Overall, time to minify jquery.js reduced from 1.6s to 0.9s.
Modified paths:
  • /trunk/phase3/includes/libs/JSMin.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/libs/JSMin.php
@@ -69,7 +69,7 @@
7070 // -- Public Static Methods --------------------------------------------------
7171
7272 public static function minify( $js ) {
73 - $jsmin = new JSMin( $js );
 73+ $jsmin = new self( $js );
7474 $ret = $jsmin->min();
7575 return $ret;
7676 }
@@ -77,7 +77,10 @@
7878 // -- Public Instance Methods ------------------------------------------------
7979
8080 public function __construct( $input ) {
 81+ // Fix line endings
8182 $this->input = str_replace( "\r\n", "\n", $input );
 83+ // Replace tabs and other control characters (except LF) with spaces
 84+ $this->input = preg_replace( '/[\x00-\x09\x0b-\x1f]/', ' ', $this->input );
8285 $this->inputLength = strlen( $this->input );
8386 }
8487
@@ -94,35 +97,35 @@
9598 protected function action( $d ) {
9699 switch( $d ) {
97100 case self::OUTPUT:
98 - // Output A. Copy B to A. Get the next B.
99101 $this->output .= $this->a;
100102
101103 case self::DELETE_A:
102 - // Copy B to A. Get the next B. (Delete A).
103104 $this->a = $this->b;
104105
105106 if ( $this->a === "'" || $this->a === '"' ) {
 107+ $interestingChars = $this->a . "\\\n";
 108+ $this->output .= $this->a;
106109 for ( ; ; ) {
107 - $this->output .= $this->a;
108 - $this->a = $this->get();
 110+ $runLength = strcspn( $this->input, $interestingChars, $this->inputIndex );
 111+ $this->output .= substr( $this->input, $this->inputIndex, $runLength );
 112+ $this->inputIndex += $runLength;
 113+ $this->a = $this->get();
109114
110115 if ( $this->a === $this->b ) {
111116 break;
112117 }
113118
114 - if ( ord( $this->a ) <= self::ORD_LF ) {
 119+ if ( $this->a === "\n" || $this->a === null ) {
115120 throw new JSMinException( 'Unterminated string literal.' );
116121 }
117122
118123 if ( $this->a === '\\' ) {
119 - $this->output .= $this->a;
120 - $this->a = $this->get();
 124+ $this->output .= $this->a . $this->get();
121125 }
122126 }
123127 }
124128
125129 case self::DELETE_B:
126 - // Get the next B. (Delete B).
127130 $this->b = $this->next();
128131
129132 if ( $this->b === '/' && (
@@ -133,6 +136,9 @@
134137 $this->output .= $this->a . $this->b;
135138
136139 for ( ; ; ) {
 140+ $runLength = strcspn( $this->input, "/\\\n", $this->inputIndex );
 141+ $this->output .= substr( $this->input, $this->inputIndex, $runLength );
 142+ $this->inputIndex += $runLength;
137143 $this->a = $this->get();
138144
139145 if ( $this->a === '/' ) {
@@ -140,7 +146,7 @@
141147 } elseif ( $this->a === '\\' ) {
142148 $this->output .= $this->a;
143149 $this->a = $this->get();
144 - } elseif ( ord( $this->a ) <= self::ORD_LF ) {
 150+ } elseif ( $this->a === "\n" || $this->a === null ) {
145151 throw new JSMinException( 'Unterminated regular expression ' .
146152 'literal.' );
147153 }
@@ -159,27 +165,11 @@
160166 * linefeed.
161167 */
162168 protected function get() {
163 - $c = $this->lookAhead;
164 - $this->lookAhead = null;
165 -
166 - if ( $c === null ) {
167 - if ( $this->inputIndex < $this->inputLength ) {
168 - $c = substr( $this->input, $this->inputIndex, 1 );
169 - $this->inputIndex += 1;
170 - } else {
171 - $c = null;
172 - }
 169+ if ( $this->inputIndex < $this->inputLength ) {
 170+ return $this->input[$this->inputIndex++];
 171+ } else {
 172+ return null;
173173 }
174 -
175 - if ( $c === "\r" ) {
176 - return "\n";
177 - }
178 -
179 - if ( $c === null || $c === "\n" || ord( $c ) >= self::ORD_SPACE ) {
180 - return $c;
181 - }
182 -
183 - return ' ';
184174 }
185175
186176 /**
@@ -212,25 +202,12 @@
213203
214204 case "\n":
215205 switch ( $this->b ) {
216 - case '{':
217 - case '[':
218 - case '(':
219 - case '+':
220 - case '-':
221 - $this->action( self::OUTPUT );
222 - break;
223 -
224206 case ' ':
225207 $this->action( self::DELETE_B );
226208 break;
227209
228210 default:
229 - if ( $this->isAlphaNum( $this->b ) ) {
230 - $this->action( self::OUTPUT );
231 - }
232 - else {
233 - $this->action( self::DELETE_A );
234 - }
 211+ $this->action( self::OUTPUT );
235212 }
236213 break;
237214
@@ -244,29 +221,6 @@
245222
246223 $this->action( self::DELETE_B );
247224 break;
248 -
249 - case "\n":
250 - switch ( $this->a ) {
251 - case '}':
252 - case ']':
253 - case ')':
254 - case '+':
255 - case '-':
256 - case '"':
257 - case "'":
258 - $this->action( self::OUTPUT );
259 - break;
260 -
261 - default:
262 - if ( $this->isAlphaNum( $this->a ) ) {
263 - $this->action( self::OUTPUT );
264 - }
265 - else {
266 - $this->action( self::DELETE_B );
267 - }
268 - }
269 - break;
270 -
271225 default:
272226 $this->action( self::OUTPUT );
273227 break;
@@ -274,44 +228,48 @@
275229 }
276230 }
277231
278 - return $this->output;
 232+ // Remove initial line break
 233+ if ( $this->output[0] !== "\n" ) {
 234+ throw new JSMinException( 'Unexpected lack of line break.' );
 235+ }
 236+ if ( $this->output === "\n" ) {
 237+ return '';
 238+ } else {
 239+ return substr( $this->output, 1 );
 240+ }
279241 }
280242
281243 /**
282 - * Get the next character, excluding comments. peek() is used to see
283 - * if a '/' is followed by a '/' or '*'.
 244+ * Get the next character, excluding comments.
284245 */
285246 protected function next() {
286 - $c = $this->get();
 247+ if ( $this->inputIndex >= $this->inputLength ) {
 248+ return null;
 249+ }
 250+ $c = $this->input[$this->inputIndex++];
287251
 252+ if ( $this->inputIndex >= $this->inputLength ) {
 253+ return $c;
 254+ }
 255+
288256 if ( $c === '/' ) {
289 - switch( $this->peek() ) {
 257+ switch( $this->input[$this->inputIndex] ) {
290258 case '/':
291 - for ( ; ; ) {
292 - $c = $this->get();
293 -
294 - if ( ord( $c ) <= self::ORD_LF ) {
295 - return $c;
296 - }
297 - }
298 -
 259+ $this->inputIndex += strcspn( $this->input, "\n", $this->inputIndex ) + 1;
 260+ return "\n";
299261 case '*':
300 - $this->get();
301 -
302 - for ( ; ; ) {
303 - switch( $this->get() ) {
304 - case '*':
305 - if ( $this->peek() === '/' ) {
306 - $this->get();
307 - return ' ';
308 - }
309 - break;
310 -
311 - case null:
312 - throw new JSMinException( 'Unterminated comment.' );
313 - }
 262+ $endPos = strpos( $this->input, '*/', $this->inputIndex + 1 );
 263+ if ( $endPos === false ) {
 264+ throw new JSMinException( 'Unterminated comment.' );
314265 }
315 -
 266+ $numLines = substr_count( $this->input, "\n", $this->inputIndex,
 267+ $endPos - $this->inputIndex );
 268+ $this->inputIndex = $endPos + 2;
 269+ if ( $numLines ) {
 270+ return str_repeat( "\n", $numLines );
 271+ } else {
 272+ return ' ';
 273+ }
316274 default:
317275 return $c;
318276 }
@@ -319,14 +277,6 @@
320278
321279 return $c;
322280 }
323 -
324 - /**
325 - * Get the next character without getting it.
326 - */
327 - protected function peek() {
328 - $this->lookAhead = $this->get();
329 - return $this->lookAhead;
330 - }
331281 }
332282
333283 // -- Exceptions ---------------------------------------------------------------

Follow-up revisions

RevisionCommit summaryAuthorDate
r73197Fix for r73196: don't use $this->a as a temporary variable, as promised in th...tstarling08:04, 17 September 2010

Comments

#Comment by Trevor Parscal (WMF) (talk | contribs)   21:32, 20 September 2010

Deleting line breaks from the minified output is important.

In the case of ResourceLoader, if you want line-numbers to match up you will need to use debug=true, which bypasses minification altogether. When debug is off ResourceLoader will not just minify, it will also combine with other resources, making line numbers only relevant for the first script. This sort of "keep newlines in place" middle-ground is really just a waste of bandwidth.

#Comment by Trevor Parscal (WMF) (talk | contribs)   16:43, 29 September 2010
#Comment by Trevor Parscal (WMF) (talk | contribs)   23:53, 12 January 2011

1. While this revision makes some measurable performance improvements, it does so at the cost of useful features, namely the ability to remove vertical white space. If vertical whitespace retention really is desired - and I will leave that debate for another time and place - it should not be implemented like this. Bug #25362 requests several special configurations for tuning ResourceLoader's output, one of which is whether to collapse vertical whitespace. Ideally, the changes in this revision that cause whitespace to be retained should be configurable.

2. These changes deviate significantly from both the structure of the standard distribution of JSMin, as well as it's goals. Correcting the PHP port to be a more accurate reflection of the original C code, or making performance improvements that do not significantly modify the way the code works are reasonable adjustments, this is outside of what I think is reasonable. Some of these changes are clever and should be pushed back upstream, others should be backed out.

3. This appears to be micro-optimization gone wrong. This code is simply not run that frequently since the results are aggressively cached at multiple levels. Large scripts like jQuery change very rarely and are subsequently minified very rarely. However I am seeing that jQuery was used as the benchmark, as if that were the amount of time we were going to be saving on every load. The reality is that site and user scripts will be minified failry frequently while others will only be minified after software deployments which are weekly at most, and often more in the order of monthly. If there is ever an observed significant performance problem with magnification we should probably look into hooking into the JSMin PHP extension when available (see: http://www.ypass.net/software/php_jsmin) which is about 25x faster than the standard PHP distribution of JSMin.

#Comment by Trevor Parscal (WMF) (talk | contribs)   00:36, 21 January 2011

Given that this file has now been deleted because of licensing issues, and replaced with code that's twice as fast and actually configurable, I would call this resolved.

Status & tagging log