r89230 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r89229‎ | r89230 | r89231 >
Date:21:52, 31 May 2011
Author:platonides
Status:reverted (Comments)
Tags:
Comment:
The big regex at doMagicLinks deserves being more thoroughly studied.
Modified paths:
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/parser/Parser.php
@@ -1197,7 +1197,7 @@
11981198 (?: [0-9] [\ \-]? ){9} # 9 digits with opt. delimiters
11991199 [0-9Xx] # check digit
12001200 \b)
1201 - )!x', array( &$this, 'magicLinkCallback' ), $text );
 1201+ )!xS', array( &$this, 'magicLinkCallback' ), $text );
12021202 wfProfileOut( __METHOD__ );
12031203 return $text;
12041204 }

Sign-offs

UserFlagDate
Hasharinspected10:14, 1 June 2011

Follow-up revisions

RevisionCommit summaryAuthorDate
r89447Provisional revert of r89230: per CR, benchmarking currently shows that it ma...brion22:27, 3 June 2011

Comments

#Comment by Brion VIBBER (talk | contribs)   22:11, 31 May 2011

Is this for performance reasons? Any before/after benchmarks to demonstrate whether and how much it helps?

#Comment by Platonides (talk | contribs)   22:19, 31 May 2011

I didn't benchmark it. I was dealing with bugzilla:29197 and realised how big it is and that it didn't had /S.

/S is equivalent to doing a pcre_verify, which is currently only useful when the regex can begin with different characters (as this is the case), so it should provide some benefits. And the cost of this extra study should be zero as the resulting regex should be cached by apc (but even without that it is probably worth).

#Comment by Brion VIBBER (talk | contribs)   22:31, 31 May 2011

Be aware that many things done 'for performance' may actually be ineffective or even outright unhelpful; please provide a test benchmark to confirm that performance is in fact equal or better.

#Comment by Hashar (talk | contribs)   10:13, 1 June 2011

We have a benchmarking class to assist you in writing the bench. See the following path for class and working examples:

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/benchmarks/

#Comment by Platonides (talk | contribs)   21:58, 3 June 2011

Thanks Hashar. As I first wanted to prove that studied regex were faster than normal ones what I did was to measure the time needed to process eswiki-20110318-pages-meta-current (3236515 revisions).

However, I got the opposite result: the studied regexes turned out to be slightly slower.

I was looking today on running pcre with just a light layer of php_pcre to look for the reason, but it's not ready yet.

class BenchmarkMagicLinks extends DumpIterator {
        public function __construct() {
               parent::__construct();
               $this->mDescription = "Run doMagicLinks on a XML dump";
        }

        public function checkOptions() {
               $this->mParser = new Parser();
               $this->mParser->startExternalParse( Title::newFromText("BenchmarkMagicLinks") , new ParserOptions, Parser::OT_HTML, true );
        }

        public function getDbType() {
                return Maintenance::DB_NONE;
        }

       public function processRevision( $rev ) {
               $this->mParser->doMagicLinks( $rev->getText() );
       }
}
$maintClass = "BenchmarkMagicLinks";
require( RUN_MAINTENANCE_IF_MAIN );
#Comment by Brion VIBBER (talk | contribs)   22:28, 3 June 2011

Thanks for starting on that! I've provisionally reverted this change in r89447 until we can verify some variant of it can improve performance.

Status & tagging log