r60986 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r60985‎ | r60986 | r60987 >
Date:20:54, 12 January 2010
Author:philip
Status:resolved (Comments)
Tags:
Comment:
1. Add a new feature to LanguageConverter, for supporting nested using of manual convert syntax like "-{-{}-}-".
2. Fixed a little bug find in getURLVariant().
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/languages/LanguageConverter.php (modified) (history)
  • /trunk/phase3/maintenance/parserTests.txt (modified) (history)

Diff [purge]

Index: trunk/phase3/languages/LanguageConverter.php
@@ -192,7 +192,7 @@
193193 // see if the preference is set in the request
194194 $ret = $wgRequest->getText( 'variant' );
195195
196 - if ( $ret ) {
 196+ if ( !$ret ) {
197197 $ret = $wgRequest->getVal( 'uselang' );
198198 }
199199
@@ -539,28 +539,69 @@
540540 }
541541
542542 /**
543 - * Convert a text fragment.
 543+ * Convert a text array.
544544 *
545 - * @param string $text text to be converted
 545+ * @param string $tarray text array to be converted
546546 * @param string $plang preferred variant
547547 * @return string converted text
548548 * @private
549549 */
550 - function convertFragment( $text, $plang ) {
551 - $marked = explode( $this->mMarkup['begin'], $text, 2 );
 550+ function convertArray( $tarray, $plang ) {
 551+ $beginlen = strlen( $this->mMarkup['begin'] );
552552 $converted = '';
 553+ $middle = '';
553554
554 - $converted .= $this->autoConvert( $marked[0], $plang );
 555+ foreach ( $tarray as $text ) {
 556+ // for nested use
 557+ if( $middle ) {
 558+ $text = $middle . $text;
 559+ $middle = '';
 560+ }
555561
556 - if ( array_key_exists( 1, $marked ) ) {
557 - $crule = new ConverterRule( $marked[1], $this );
 562+ // find first and last begin markup(s)
 563+ $firstbegin = strpos( $text, $this->mMarkup['begin'] );
 564+ $lastbegin = strrpos( $text, $this->mMarkup['begin'] );
 565+
 566+ // if $text contains no begin markup,
 567+ // append $text restore end markup to $converted
 568+ if( $firstbegin === false ) {
 569+ $converted .= $text;
 570+ $converted .= $this->mMarkup['end'];
 571+ continue;
 572+ }
 573+
 574+ // split $text into $left and $right,
 575+ // omit the begin markup in $right
 576+ $left = substr( $text, 0, $firstbegin );
 577+ $right = substr( $text, $lastbegin + $beginlen );
 578+
 579+ // always convert $left and append it to $converted
 580+ // for nested case, $left is blank but can also be converted
 581+ $converted .= $this->autoConvert( $left, $plang );
 582+
 583+ // parse and apply manual rule from $right
 584+ $crule = new ConverterRule( $right, $this );
558585 $crule->parse( $plang );
559 - $converted .= $crule->getDisplay();
 586+ $right = $crule->getDisplay();
560587 $this->applyManualConv( $crule );
561 - } else {
562 - $converted .= $this->mMarkup['end'];
 588+
 589+ // if $text contains only one begin markup,
 590+ // append $left and $right to $converted.
 591+ //
 592+ // otherwise it's a nested use like "-{-{}-}-",
 593+ // this should be handled properly.
 594+ if( $firstbegin === $lastbegin ) {
 595+ $converted .= $right;
 596+ }
 597+ else {
 598+ // not omit the first begin markup
 599+ $middle = substr( $text, $firstbegin, $lastbegin - $firstbegin );
 600+ $middle .= $right;
 601+ //print $middle;
 602+ }
563603 }
564 -
 604+ // Remove the last delimiter (wasn't real)
 605+ $converted = substr( $converted, 0, - strlen( $this->mMarkup['end'] ) );
565606 return $converted;
566607 }
567608
@@ -586,14 +627,8 @@
587628 $plang = $this->getPreferredVariant();
588629
589630 $tarray = StringUtils::explode( $this->mMarkup['end'], $text );
590 - $converted = '';
 631+ $converted = $this->convertArray( $tarray, $plang );
591632
592 - foreach ( $tarray as $txt ) {
593 - $converted .= $this->convertFragment( $txt, $plang );
594 - }
595 -
596 - // Remove the last delimiter (wasn't real)
597 - $converted = substr( $converted, 0, - strlen( $this->mMarkup['end'] ) );
598633 return $converted;
599634 }
600635
Index: trunk/phase3/RELEASE-NOTES
@@ -295,6 +295,7 @@
296296 * (bug 22051) Returing false in SpecialContributionsBeforeMainOutput hook now
297297 stops normal output
298298 * Send new password e-mail in users preference language
 299+* LanguageConverter now support nested using of manual convert syntax like "-{-{}-}-"
299300
300301 === Bug fixes in 1.16 ===
301302
Index: trunk/phase3/maintenance/parserTests.txt
@@ -6962,6 +6962,17 @@
69636963 !! end
69646964
69656965 !! test
 6966+Nested using of manual convert syntax
 6967+!! options
 6968+language=zh variant=zh-hk
 6969+!! input
 6970+Nested: -{zh-hans:Hi -{zh-cn:China;zh-sg:Singapore;}-;zh-hant:Hello -{zh-tw:Taiwan;zh-hk:H-{ong}- K-{}-ong;}-;}-!
 6971+!! result
 6972+<p>Nested: Hello Hong Kong!
 6973+</p>
 6974+!! end
 6975+
 6976+!! test
69666977 Do not convert roman numbers to language variants
69676978 !! options
69686979 language=sr variant=sr-ec

Follow-up revisions

RevisionCommit summaryAuthorDate
r60989follow-up r60986. fix for parser test failed.philip22:02, 12 January 2010
r60993follow-up r60986: Revert earlier test change - philip showed me the error of ...mah23:05, 12 January 2010
r61233In LanguageConverter:...tstarling02:36, 19 January 2010

Comments

#Comment by OverlordQ (talk | contribs)   21:08, 12 January 2010

Breaks two parser tests:

Reading tests from "maintenance/parserTests.txt"...
Running test Explicit session-wise language variant mapping (H flag for hide)... FAILED!
--- /tmp/mwParser-1279836420-expected   2010-01-12 15:03:01.000000000 -0600
+++ /tmp/mwParser-1279836420-actual     2010-01-12 15:03:01.000000000 -0600
@@ -1,3 +1,3 @@
 <p>(This should be stripped!)
-Taiwan is Taiwan.
+Taiwan is China.
 </p>
Running test Do not convert roman numbers to language variants... FAILED!
--- /tmp/mwParser-703726761-expected    2010-01-12 15:03:02.000000000 -0600
+++ /tmp/mwParser-703726761-actual      2010-01-12 15:03:02.000000000 -0600
@@ -1,2 +1,2 @@
-<p>Фридрих IV је цар.
+<p>Fridrih IV je car.
 </p>

2 previously passing test(s) now FAILING! :(

  • Explicit session-wise language variant mapping (H flag for hide) [Introduced between 12-Jan-2010 14:59:42, 1.16alpha (r60985) and 12-Jan-2010 15:04:48, 1.16alpha (r60986)]
  • Do not convert roman numbers to language variants [Introduced between 12-Jan-2010 14:59:42, 1.16alpha (r60985) and 12-Jan-2010 15:04:48, 1.16alpha (r60986)]
#Comment by PhiLiP (talk | contribs)   22:02, 12 January 2010

fixed on r60989.

#Comment by Tim Starling (talk | contribs)   06:36, 15 January 2010

That's a very curious method for parsing a recursive structure. The time order is O(N^2), but the "new ConverterRule" line is so slow that you'd need a test case that ran for a minute or so before the O(N^2) term began to dominate.

There's a bug in it which has always been there: an unclosed rule will be treated as a closed rule, so "-{T|hello" will set the title despite being unclosed.

Both problems would be fixed if this were rewritten as a classic recursive descent parser, with depth limited to say 5 or 10 to avoid excessive stack usage.

#Comment by Tim Starling (talk | contribs)   15:06, 18 January 2010

I'm doing the parser rewrite, it's almost ready to commit.

#Comment by Tim Starling (talk | contribs)   02:42, 19 January 2010

Done in r61233, marking this resolved. Please review it, I tried to keep it fairly simple so that you can understand it and take over maintenance.

Status & tagging log