r109847 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r109846‎ | r109847 | r109848 >
Date:19:35, 23 January 2012
Author:tstarling
Status:ok (Comments)
Tags:
Comment:
Fix for r109720: replace the last two regexes with plain string functions. The regular expression used for stripping the last path component from the output was inefficient, because PCRE does not optimise "$" anchors correctly. It scans the entire string forwards, instead of scanning backwards starting from the anchor. Passes tests.
Modified paths:
  • /trunk/phase3/includes/GlobalFunctions.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/GlobalFunctions.php
@@ -585,6 +585,7 @@
586586 $prefixLengthTwo = substr( $urlPath, $inputOffset, 2 );
587587 $prefixLengthThree = substr( $urlPath, $inputOffset, 3 );
588588 $prefixLengthFour = substr( $urlPath, $inputOffset, 4 );
 589+ $trimOutput = false;
589590
590591 if ( $prefixLengthTwo == './' ) {
591592 # Step A, remove leading "./"
@@ -604,12 +605,12 @@
605606 # remove last path component in output
606607 $inputOffset += 2;
607608 $urlPath[$inputOffset] = '/';
608 - $output = preg_replace('%(^|/)[^/]*$%', '', $output);
 609+ $trimOutput = true;
609610 } elseif ( $prefixLengthFour == '/../' ) {
610611 # Step C, replace leading "/../" with "/" and
611612 # remove last path component in output
612613 $inputOffset += 3;
613 - $output = preg_replace('%(^|/)[^/]*$%', '', $output);
 614+ $trimOutput = true;
614615 } elseif ( ( $prefixLengthOne == '.' ) && ( $inputOffset + 1 == $inputLength ) ) {
615616 # Step D, remove "^.$"
616617 $inputOffset += 1;
@@ -618,10 +619,28 @@
619620 $inputOffset += 2;
620621 } else {
621622 # Step E, move leading path segment to output
622 - preg_match( '%/?[^/]*%A', $urlPath, $matches, 0, $inputOffset );
623 - $inputOffset += strlen( $matches[0] );
624 - $output .= $matches[0];
 623+ if ( $prefixLengthOne == '/' ) {
 624+ $slashPos = strpos( $urlPath, '/', $inputOffset + 1 );
 625+ } else {
 626+ $slashPos = strpos( $urlPath, '/', $inputOffset );
 627+ }
 628+ if ( $slashPos === false ) {
 629+ $output .= substr( $urlPath, $inputOffset );
 630+ $inputOffset = $inputLength;
 631+ } else {
 632+ $output .= substr( $urlPath, $inputOffset, $slashPos - $inputOffset );
 633+ $inputOffset += $slashPos - $inputOffset;
 634+ }
625635 }
 636+
 637+ if ( $trimOutput ) {
 638+ $slashPos = strrpos( $output, '/' );
 639+ if ( $slashPos === false ) {
 640+ $output = '';
 641+ } else {
 642+ $output = substr( $output, 0, $slashPos );
 643+ }
 644+ }
626645 }
627646
628647 return $output;

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r109720Follow-up 102587 to address performance concerns in wfRemoveDotSegments.gicode04:57, 22 January 2012

Comments

#Comment by Moejoe000 (talk | contribs)   21:23, 23 January 2012

The whole wfRemoveDotSegments seems unreasonably complicated.

The following function (own work, public domain) does the same (here the buffers of RFC3986 are implemented as arrays where all the comparisons are much easier to code) and speeds up by a factor of 4 and does (due to my tests) the same as the original version:

function wfRemoveDotSegments( $urlPath ) {
	$parts = explode( '/', $urlPath );
	$out = array();
	foreach( $parts as $part ) {
		if( $part === '..' ) {
			if( count( $out ) > 0 ) {
				array_pop( $out );		
				if( count( $out ) === 0 ) {
					$out[] = '­';
				}
			}
			$slashAtEnd = true;
		} elseif(  $part === '.' ) {
			$slashAtEnd = true;
		} else {
			$out[] = $part;
			$slashAtEnd = false;
		}
	}
	if( $slashAtEnd ) $out[] = '­';
	return implode( '/', $out );
}
#Comment by Aaron Schulz (talk | contribs)   21:29, 23 January 2012

Au contraire, it clearly should be implemented as a MediaWiki extension wrapped around a custom C zend extension.

#Comment by Platonides (talk | contribs)   21:34, 23 January 2012

That's why MW Parser uses a reversed string in some operations.

Status & tagging log