r61551 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r61550‎ | r61551 | r61552 >
Date:02:41, 27 January 2010
Author:tstarling
Status:ok
Tags:
Comment:
Revert r61528, r61527, r61526, r61525, r61519, r61515, r61053, r61052 (Parser::doQuotes() rewrite). Lots of issues to discuss, needs more review than I have time to give it pre-1.16. I'll split it out to a branch.
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/StringUtils.php (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)
  • /trunk/phase3/maintenance/parserTests.txt (modified) (history)
  • /trunk/phase3/tests/preg_split_test.php (deleted) (history)

Diff [purge]

Index: trunk/phase3/maintenance/parserTests.txt
@@ -116,7 +116,7 @@
117117 </li><li> plain<b><i>bold-italic</i>bold</b>plain
118118 </li><li> plain<i>italic<b>bold-italic</b></i>plain
119119 </li><li> plain<b>bold<i>bold-italic</i></b>plain
120 -</li><li> plain l&#39;<i>italic</i>plain
 120+</li><li> plain l'<i>italic</i>plain
121121 </li><li> plain l'<b>bold</b> plain
122122 </li></ul>
123123
@@ -5253,17 +5253,19 @@
52545254 </p>
52555255 !! end
52565256
5257 -# This was the original html, but it has also been
5258 -# <p>'<i>bold'</i><b>bold<i>bolditalics</i></b>
 5257+# Original result was this:
 5258+# <p><b>bold</b><b>bold<i>bolditalics</i></b>
52595259 # </p>
5260 -# See bug 18765.
 5260+# While that might be marginally more intuitive, maybe, the six-apostrophe
 5261+# construct is clearly pathological and the result stated here (which is what
 5262+# the parser actually does) is about as reasonable as anything.
52615263 !!test
52625264 Mixing markup for italics and bold
52635265 !! options
52645266 !! input
52655267 '''bold''''''bold''bolditalics'''''
52665268 !! result
5267 -<p><b>bold</b><b>bold<i>bolditalics</i></b>
 5269+<p>'<i>bold'</i><b>bold<i>bolditalics</i></b>
52685270 </p>
52695271 !! end
52705272
@@ -6415,7 +6417,7 @@
64166418 !! input
64176419 ''' ''x'
64186420 !! result
6419 -<pre>&#39;<i> </i>x'
 6421+<pre>'<i> </i>x'
64206422 </pre>
64216423 !!end
64226424
@@ -7558,82 +7560,6 @@
75597561 <a href="https://www.mediawiki.org/wiki/Main_Page#section" title="Main Page">#section</a>
75607562 !! end
75617563
7562 -!! test
7563 -Bold/italic markup handled differently depending on leading whitespace (bug 18765)
7564 -!!input
7565 -'''Look at ''this edit'''s complicated bold/italic markup!'''
7566 -
7567 -<!-- Comment -->'''Look at ''this edit'''s complicated bold/italic markup!'''
7568 -
7569 -<span> '''Look at ''this edit'''s complicated bold/italic markup!'''</span>
7570 -
7571 -<nowiki></nowiki> '''Look at ''this edit'''s complicated bold/italic markup!'''
7572 -
7573 -<!-- Hello world---> '''Look at ''this edit'''s complicated bold/italic markup!'''
7574 -
7575 -{|
7576 -| '''Look at ''this edit'''s complicated bold/italic markup!'''
7577 -|}
7578 -
7579 -'''This was Italic'' this was plain''' and this was bold'''
7580 -but '''This is bold'' this is bold italic''' and this is bold'''
7581 -
7582 -<!-- Wishlist: Breaking because <span> and | are treated as text
7583 -<span>'''Look at ''this edit'''s complicated bold/italic markup!'''</span>
7584 -{|
7585 -|'''Look at ''this edit'''s complicated bold/italic markup!'''
7586 -|}
7587 -!! result
7588 -<p><b>Look at <i>this edit&#39;</i>s complicated bold/italic markup!</b>
7589 -</p><p><b>Look at <i>this edit&#39;</i>s complicated bold/italic markup!</b>
7590 -</p><p><span> <b>Look at <i>this edit&#39;</i>s complicated bold/italic markup!</b></span>
7591 -</p><p> <b>Look at <i>this edit&#39;</i>s complicated bold/italic markup!</b>
7592 -</p>
7593 -<pre><b>Look at <i>this edit&#39;</i>s complicated bold/italic markup!</b>
7594 -</pre>
7595 -<table>
7596 -<tr>
7597 -<td> <b>Look at <i>this edit&#39;</i>s complicated bold/italic markup!</b>
7598 -</td></tr></table>
7599 -<p><b>This was Italic<i> this was plain&#39;</i> and this was bold</b>
7600 -but <b>This is bold<i> this is bold italic&#39;</i> and this is bold</b>
7601 -</p><p><br />
7602 -</p>
7603 -!! end
7604 -
7605 -!! test
7606 -Six quotes
7607 -!!input
7608 -''Italic''''''Bold
7609 -
7610 -'''Bold''BoldItalic''''''Normal
7611 -
7612 -''Italic'''BoldItalic''''''Normal'''''
7613 -
7614 -'''''BoldItalic''''''MoreBoldItalic''
7615 -
7616 -''''''Normal
7617 -!!result
7618 -<p><i>Italic'</i><b>Bold</b>
7619 -</p><p><b>Bold<i>BoldItalic'</i></b>Normal
7620 -</p><p><i>Italic<b>BoldItalic'</b></i>Normal
7621 -</p><p><i><b>BoldItalic</b><b>MoreBoldItalic</b></i>
7622 -</p><p>Normal
7623 -</p>
7624 -!!end
7625 -
7626 -
7627 -!! test
7628 -Too many quotes
7629 -!!input
7630 -I '''like'''''quotes'''''''''''
7631 -!! result
7632 -<p>I <b>like</b><i>quotes''''''</i><b> </b>
7633 -</p>
7634 -!! end
7635 -
7636 -
76377564 Note: some elements used in these Microdata examples don't work, like <img>
76387565 and <time>.
76397566 !! test
Index: trunk/phase3/tests/preg_split_test.php
@@ -1,24 +0,0 @@
2 -<?php
3 -include "../includes/StringUtils.php";
4 -
5 -$pattern = "/('')+/";
6 -$subject = str_repeat("'' ", 1024*1024 + 7);
7 -
8 -$m = memory_get_usage();
9 -
10 -$ps1 = preg_split($pattern, $subject);
11 -
12 -$r = "";
13 -foreach ($ps1 as $c) {
14 - $r .= $c . "|";
15 -}
16 -echo "Original preg_split: " . md5($r) . " " . (memory_get_usage()-$m) . "\n";
17 -
18 -unset($ps1);
19 -
20 -$r = "";
21 -$ps2 = StringUtils::preg_split($pattern, $subject);
22 -foreach ($ps2 as $c) {
23 - $r .= $c . "|";
24 -}
25 -echo "StringUtils preg_split: " . md5($r) . " " . (memory_get_usage()-$m) . "\n";
Index: trunk/phase3/includes/parser/Parser.php
@@ -213,7 +213,7 @@
214214 * Must not consist of all title characters, or else it will change
215215 * the behaviour of <nowiki> in a link.
216216 */
217 - # $this->mUniqPrefix = "\x07UNIQ" . Parser::getRandomString();
 217+ #$this->mUniqPrefix = "\x07UNIQ" . Parser::getRandomString();
218218 # Changed to \x7f to allow XML double-parsing -- TS
219219 $this->mUniqPrefix = "\x7fUNIQ" . self::getRandomString();
220220
@@ -338,7 +338,7 @@
339339 '/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1&nbsp;\\2',
340340 # french spaces, Guillemet-right
341341 '/(\\302\\253) /' => '\\1&nbsp;',
342 - '/&nbsp;(!\s*important)/' => ' \\1', # Beware of CSS magic word !important, bug #11874.
 342+ '/&nbsp;(!\s*important)/' => ' \\1', #Beware of CSS magic word !important, bug #11874.
343343 );
344344 $text = preg_replace( array_keys($fixtags), array_values($fixtags), $text );
345345
@@ -556,7 +556,7 @@
557557 $taglist = implode( '|', $elements );
558558 $start = "/<($taglist)(\\s+[^>]*?|\\s*?)(\/?" . ">)|<(!--)/i";
559559
560 - while ( $text !== '' ) {
 560+ while ( $text != '' ) {
561561 $p = preg_split( $start, $text, 2, PREG_SPLIT_DELIM_CAPTURE );
562562 $stripped .= $p[0];
563563 if( count( $p ) < 5 ) {
@@ -723,11 +723,11 @@
724724 array_push( $tr_history , false );
725725 array_push( $tr_attributes , '' );
726726 array_push( $has_opened_tr , false );
727 - } elseif ( count ( $td_history ) == 0 ) {
 727+ } else if ( count ( $td_history ) == 0 ) {
728728 // Don't do any of the following
729729 $out .= $outLine."\n";
730730 continue;
731 - } elseif ( substr ( $line , 0 , 2 ) === '|}' ) {
 731+ } else if ( substr ( $line , 0 , 2 ) === '|}' ) {
732732 // We are ending a table
733733 $line = '</table>' . substr ( $line , 2 );
734734 $last_tag = array_pop ( $last_tag_history );
@@ -745,7 +745,7 @@
746746 }
747747 array_pop ( $tr_attributes );
748748 $outLine = $line . str_repeat( '</dd></dl>' , $indent_level );
749 - } elseif ( substr ( $line , 0 , 2 ) === '|-' ) {
 749+ } else if ( substr ( $line , 0 , 2 ) === '|-' ) {
750750 // Now we have a table row
751751 $line = preg_replace( '#^\|-+#', '', $line );
752752
@@ -773,7 +773,7 @@
774774 array_push ( $td_history , false );
775775 array_push ( $last_tag_history , '' );
776776 }
777 - elseif ( $first_character === '|' || $first_character === '!' || substr ( $line , 0 , 2 ) === '|+' ) {
 777+ else if ( $first_character === '|' || $first_character === '!' || substr ( $line , 0 , 2 ) === '|+' ) {
778778 // This might be cell elements, td, th or captions
779779 if ( substr ( $line , 0 , 2 ) === '|+' ) {
780780 $first_character = '+';
@@ -818,9 +818,9 @@
819819
820820 if ( $first_character === '|' ) {
821821 $last_tag = 'td';
822 - } elseif ( $first_character === '!' ) {
 822+ } else if ( $first_character === '!' ) {
823823 $last_tag = 'th';
824 - } elseif ( $first_character === '+' ) {
 824+ } else if ( $first_character === '+' ) {
825825 $last_tag = 'caption';
826826 } else {
827827 $last_tag = '';
@@ -835,7 +835,7 @@
836836 // be mistaken as delimiting cell parameters
837837 if ( strpos( $cell_data[0], '[[' ) !== false ) {
838838 $cell = "{$previous}<{$last_tag}>{$cell}";
839 - } elseif ( count ( $cell_data ) == 1 )
 839+ } else if ( count ( $cell_data ) == 1 )
840840 $cell = "{$previous}<{$last_tag}>{$cell_data[0]}";
841841 else {
842842 $attributes = $this->mStripState->unstripBoth( $cell_data[0] );
@@ -1108,59 +1108,100 @@
11091109 }
11101110
11111111 /**
1112 - * Processes bolds and italics on a single line.
11131112 * Helper function for doAllQuotes()
11141113 */
11151114 public function doQuotes( $text ) {
1116 - # Counts the number of occurrences of bold and italics mark-ups.
1117 - self::countBoldAndItalic($text, $numbold, $numitalics);
1118 -
1119 - if ( ( $numbold == 0 ) && ( $numitalics == 0 ) )
 1115+ $arr = preg_split( "/(''+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE );
 1116+ if ( count( $arr ) == 1 )
11201117 return $text;
11211118 else
11221119 {
 1120+ # First, do some preliminary work. This may shift some apostrophes from
 1121+ # being mark-up to being text. It also counts the number of occurrences
 1122+ # of bold and italics mark-ups.
 1123+ $i = 0;
 1124+ $numbold = 0;
 1125+ $numitalics = 0;
 1126+ foreach ( $arr as $r )
 1127+ {
 1128+ if ( ( $i % 2 ) == 1 )
 1129+ {
 1130+ # If there are ever four apostrophes, assume the first is supposed to
 1131+ # be text, and the remaining three constitute mark-up for bold text.
 1132+ if ( strlen( $arr[$i] ) == 4 )
 1133+ {
 1134+ $arr[$i-1] .= "'";
 1135+ $arr[$i] = "'''";
 1136+ }
 1137+ # If there are more than 5 apostrophes in a row, assume they're all
 1138+ # text except for the last 5.
 1139+ else if ( strlen( $arr[$i] ) > 5 )
 1140+ {
 1141+ $arr[$i-1] .= str_repeat( "'", strlen( $arr[$i] ) - 5 );
 1142+ $arr[$i] = "'''''";
 1143+ }
 1144+ # Count the number of occurrences of bold and italics mark-ups.
 1145+ # We are not counting sequences of five apostrophes.
 1146+ if ( strlen( $arr[$i] ) == 2 ) { $numitalics++; }
 1147+ else if ( strlen( $arr[$i] ) == 3 ) { $numbold++; }
 1148+ else if ( strlen( $arr[$i] ) == 5 ) { $numitalics++; $numbold++; }
 1149+ }
 1150+ $i++;
 1151+ }
 1152+
11231153 # If there is an odd number of both bold and italics, it is likely
11241154 # that one of the bold ones was meant to be an apostrophe followed
11251155 # by italics. Which one we cannot know for certain, but it is more
11261156 # likely to be one that has a single-letter word before it.
11271157 if ( ( $numbold % 2 == 1 ) && ( $numitalics % 2 == 1 ) )
11281158 {
 1159+ $i = 0;
 1160+ $firstsingleletterword = -1;
 1161+ $firstmultiletterword = -1;
 1162+ $firstspace = -1;
 1163+ foreach ( $arr as $r )
 1164+ {
 1165+ if ( ( $i % 2 == 1 ) and ( strlen( $r ) == 3 ) )
 1166+ {
 1167+ $x1 = substr ($arr[$i-1], -1);
 1168+ $x2 = substr ($arr[$i-1], -2, 1);
 1169+ if ($x1 === ' ') {
 1170+ if ($firstspace == -1) $firstspace = $i;
 1171+ } else if ($x2 === ' ') {
 1172+ if ($firstsingleletterword == -1) $firstsingleletterword = $i;
 1173+ } else {
 1174+ if ($firstmultiletterword == -1) $firstmultiletterword = $i;
 1175+ }
 1176+ }
 1177+ $i++;
 1178+ }
11291179
1130 - # This algorithm moves the literal quote at the
1131 - # right of a single word, at the right of a
1132 - # multiletter word or at the right of a space.
1133 - # Otherwise, it does nothing.
1134 - #
1135 - # The original if-based version can be found at
1136 - # http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/parser/Parser.php?revision=61519&view=markup
1137 - #
1138 - # Unlike the original one, here we convert the
1139 - # texty quotes to &#39; which shouldn't matter.
1140 -
1141 - $quoteBalancerReplacements = array(
1142 - "/(?<= [^ ])'''(?!')/"=>"&#39;''",
1143 - "/(?<=[^ '])'''(?!')/"=>"&#39;''",
1144 - "/(^|(?<=[^'])) '''(?!')/"=>" &#39;''");
1145 -
1146 - foreach( $quoteBalancerReplacements as $k => $v) {
1147 - $text = preg_replace($k, $v, $text, 1, $count);
1148 - if ($count != 0)
1149 - break;
 1180+ # If there is a single-letter word, use it!
 1181+ if ($firstsingleletterword > -1)
 1182+ {
 1183+ $arr [ $firstsingleletterword ] = "''";
 1184+ $arr [ $firstsingleletterword-1 ] .= "'";
11501185 }
 1186+ # If not, but there's a multi-letter word, use that one.
 1187+ else if ($firstmultiletterword > -1)
 1188+ {
 1189+ $arr [ $firstmultiletterword ] = "''";
 1190+ $arr [ $firstmultiletterword-1 ] .= "'";
 1191+ }
 1192+ # ... otherwise use the first one that has neither.
 1193+ # (notice that it is possible for all three to be -1 if, for example,
 1194+ # there is only one pentuple-apostrophe in the line)
 1195+ else if ($firstspace > -1)
 1196+ {
 1197+ $arr [ $firstspace ] = "''";
 1198+ $arr [ $firstspace-1 ] .= "'";
 1199+ }
11511200 }
11521201
1153 - # Split in groups of 2, 3, 5 or 6 apostrophes.
1154 - # If there are ever four apostrophes, assume the first is supposed to
1155 - # be text, and the remaining three constitute mark-up for bold text.
1156 - # If there are more than 6 apostrophes in a row, assume they're all
1157 - # text except for the last 6.
1158 - $arr = Stringutils::preg_split( "/('{2,3}(?:''')?)(?!')/", $text, -1, PREG_SPLIT_DELIM_CAPTURE );
1159 -
1160 -
11611202 # Now let's actually convert our apostrophic mush to HTML!
1162 - $output = ''; # Processed text
1163 - $buffer = ''; # Content if $state is 'both'
1164 - $state = ''; # Flags with the order of open tags: '|b|i|bi|ib|both'
 1203+ $output = '';
 1204+ $buffer = '';
 1205+ $state = '';
11651206 $i = 0;
11661207 foreach ($arr as $r)
11671208 {
@@ -1177,58 +1218,43 @@
11781219 {
11791220 if ($state === 'i')
11801221 { $output .= '</i>'; $state = ''; }
1181 - elseif ($state === 'bi')
 1222+ else if ($state === 'bi')
11821223 { $output .= '</i>'; $state = 'b'; }
1183 - elseif ($state === 'ib')
 1224+ else if ($state === 'ib')
11841225 { $output .= '</b></i><b>'; $state = 'b'; }
1185 - elseif ($state === 'both')
 1226+ else if ($state === 'both')
11861227 { $output .= '<b><i>'.$buffer.'</i>'; $state = 'b'; }
11871228 else # $state can be 'b' or ''
11881229 { $output .= '<i>'; $state .= 'i'; }
11891230 }
1190 - elseif (strlen ($r) == 3)
 1231+ else if (strlen ($r) == 3)
11911232 {
11921233 if ($state === 'b')
11931234 { $output .= '</b>'; $state = ''; }
1194 - elseif ($state === 'bi')
 1235+ else if ($state === 'bi')
11951236 { $output .= '</i></b><i>'; $state = 'i'; }
1196 - elseif ($state === 'ib')
 1237+ else if ($state === 'ib')
11971238 { $output .= '</b>'; $state = 'i'; }
1198 - elseif ($state === 'both')
 1239+ else if ($state === 'both')
11991240 { $output .= '<i><b>'.$buffer.'</b>'; $state = 'i'; }
12001241 else # $state can be 'i' or ''
12011242 { $output .= '<b>'; $state .= 'b'; }
12021243 }
1203 - elseif (strlen ($r) == 5)
 1244+ else if (strlen ($r) == 5)
12041245 {
12051246 if ($state === 'b')
12061247 { $output .= '</b><i>'; $state = 'i'; }
1207 - elseif ($state === 'i')
 1248+ else if ($state === 'i')
12081249 { $output .= '</i><b>'; $state = 'b'; }
1209 - elseif ($state === 'bi')
 1250+ else if ($state === 'bi')
12101251 { $output .= '</i></b>'; $state = ''; }
1211 - elseif ($state === 'ib')
 1252+ else if ($state === 'ib')
12121253 { $output .= '</b></i>'; $state = ''; }
1213 - elseif ($state === 'both')
 1254+ else if ($state === 'both')
12141255 { $output .= '<i><b>'.$buffer.'</b></i>'; $state = ''; }
12151256 else # ($state == '')
12161257 { $buffer = ''; $state = 'both'; }
12171258 }
1218 - elseif (strlen ($r) == 6)
1219 - {
1220 - if ($state === 'b')
1221 - { $output .= '</b><b>'; $state = 'b'; }
1222 - elseif ($state === 'i')
1223 - { $output .= '\'</i><b>'; $state = 'b'; }
1224 - elseif ($state === 'bi')
1225 - { $output .= '\'</i></b>'; $state = ''; }
1226 - elseif ($state === 'ib')
1227 - { $output .= '\'</b></i>'; $state = ''; }
1228 - elseif ($state === 'both')
1229 - { $output .= '<i><b>'.$buffer.'</b><b>'; $state = 'ib'; }
1230 - else # ($state == '')
1231 - { $buffer = ''; $state = ''; }
1232 - }
12331259 }
12341260 $i++;
12351261 }
@@ -1247,57 +1273,6 @@
12481274 }
12491275
12501276 /**
1251 - * Counts the number of bold and italic items from a line of text.
1252 - * Helper function for doQuotes()
1253 - */
1254 - private static function countBoldAndItalic($text, &$numBold, &$numItalics) {
1255 - $numBold = 0;
1256 - $numItalics = 0;
1257 - $offset = 0;
1258 -
1259 - do {
1260 - $offset = strpos($text, "'", $offset);
1261 - if ($offset === false)
1262 - return;
1263 -
1264 - $quoteLen = strspn($text, "'", $offset);
1265 - $offset += $quoteLen;
1266 -
1267 - switch ($quoteLen) {
1268 - case 0:
1269 - case 1:
1270 - break;
1271 -
1272 - case 2:
1273 - $numItalics++;
1274 - break;
1275 -
1276 - case 3:
1277 - $numBold++;
1278 - break;
1279 -
1280 - case 4:
1281 - # If there are ever four apostrophes, assume the first is supposed to
1282 - # be text, and the remaining three constitute mark-up for bold text.
1283 - $numBold++;
1284 - $numItalics++;
1285 - break;
1286 -
1287 - case 5:
1288 - $numItalics++;
1289 - $numBold++;
1290 - break;
1291 -
1292 - case 6:
1293 - default:
1294 - # If there are more than 6 apostrophes in a row, assume they're all
1295 - # text except for the last 6.
1296 - $numBold+=2;
1297 - }
1298 - } while (true);
1299 - }
1300 -
1301 - /**
13021277 * Replace external links (REL)
13031278 *
13041279 * Note: this is all very hackish and the order of execution matters a lot.
@@ -1538,9 +1513,9 @@
15391514 $sk = $this->mOptions->getSkin();
15401515 $holders = new LinkHolderArray( $this );
15411516
1542 - # split the entire text string on occurences of [[
 1517+ #split the entire text string on occurences of [[
15431518 $a = StringUtils::explode( '[[', ' ' . $s );
1544 - # get the first element (all text up to first [[), and remove the space we added
 1519+ #get the first element (all text up to first [[), and remove the space we added
15451520 $s = $a->current();
15461521 $a->next();
15471522 $line = $a->current(); # Workaround for broken ArrayIterator::next() that returns "void"
@@ -1685,10 +1660,10 @@
16861661
16871662 if ( $might_be_img ) { # if this is actually an invalid link
16881663 wfProfileIn( __METHOD__."-might_be_img" );
1689 - if ( $ns == NS_FILE && $noforce ) { # but might be an image
 1664+ if ( $ns == NS_FILE && $noforce ) { #but might be an image
16901665 $found = false;
16911666 while ( true ) {
1692 - # look at the next 'line' to see if we can close it there
 1667+ #look at the next 'line' to see if we can close it there
16931668 $a->next();
16941669 $next_line = $a->current();
16951670 if ( $next_line === false || $next_line === null ) {
@@ -1702,24 +1677,24 @@
17031678 $trail = $m[2];
17041679 break;
17051680 } elseif ( count( $m ) == 2 ) {
1706 - # if there's exactly one ]] that's fine, we'll keep looking
 1681+ #if there's exactly one ]] that's fine, we'll keep looking
17071682 $text .= "[[{$m[0]}]]{$m[1]}";
17081683 } else {
1709 - # if $next_line is invalid too, we need look no further
 1684+ #if $next_line is invalid too, we need look no further
17101685 $text .= '[[' . $next_line;
17111686 break;
17121687 }
17131688 }
17141689 if ( !$found ) {
17151690 # we couldn't find the end of this imageLink, so output it raw
1716 - # but don't ignore what might be perfectly normal links in the text we've examined
 1691+ #but don't ignore what might be perfectly normal links in the text we've examined
17171692 $holders->merge( $this->replaceInternalLinks2( $text ) );
17181693 $s .= "{$prefix}[[$link|$text";
17191694 # note: no $trail, because without an end, there *is* no trail
17201695 wfProfileOut( __METHOD__."-might_be_img" );
17211696 continue;
17221697 }
1723 - } else { # it's not an image, so output it raw
 1698+ } else { #it's not an image, so output it raw
17241699 $s .= "{$prefix}[[$link|$text";
17251700 # note: no $trail, because without an end, there *is* no trail
17261701 wfProfileOut( __METHOD__."-might_be_img" );
@@ -1796,7 +1771,7 @@
17971772 }
17981773
17991774 # Self-link checking
1800 - if( $nt->getFragment() === '' && $ns !== NS_SPECIAL ) {
 1775+ if( $nt->getFragment() === '' && $ns != NS_SPECIAL ) {
18011776 if( in_array( $nt->getPrefixedText(), $selflink, true ) ) {
18021777 $s .= $prefix . $sk->makeSelfLinkObj( $nt, $text, '', $trail );
18031778 continue;
@@ -1916,7 +1891,7 @@
19171892 */
19181893 /* private */ function closeParagraph() {
19191894 $result = '';
1920 - if ( $this->mLastSection !== '' ) {
 1895+ if ( $this->mLastSection != '' ) {
19211896 $result = '</' . $this->mLastSection . ">\n";
19221897 }
19231898 $this->mInPre = false;
@@ -1932,7 +1907,7 @@
19331908 if ( $fl < $shorter ) { $shorter = $fl; }
19341909
19351910 for ( $i = 0; $i < $shorter; ++$i ) {
1936 - if ( $st1{$i} !== $st2{$i} ) { break; }
 1911+ if ( $st1{$i} != $st2{$i} ) { break; }
19371912 }
19381913 return $i;
19391914 }
@@ -2105,7 +2080,7 @@
21062081 '<td|<th|<\\/?div|<hr|<\\/pre|<\\/p|'.$this->mUniqPrefix.'-pre|<\\/li|<\\/ul|<\\/ol|<\\/?center)/iS', $t );
21072082 if ( $openmatch or $closematch ) {
21082083 $paragraphStack = false;
2109 - # TODO bug 5718: paragraph closed
 2084+ # TODO bug 5718: paragraph closed
21102085 $output .= $this->closeParagraph();
21112086 if ( $preOpenMatch and !$preCloseMatch ) {
21122087 $this->mInPre = true;
@@ -2115,8 +2090,8 @@
21162091 } else {
21172092 $inBlockElem = true;
21182093 }
2119 - } elseif ( !$inBlockElem && !$this->mInPre ) {
2120 - if ( ' ' == substr( $t, 0, 1 ) and ( $this->mLastSection === 'pre' or trim($t) !== '' ) ) {
 2094+ } else if ( !$inBlockElem && !$this->mInPre ) {
 2095+ if ( ' ' == substr( $t, 0, 1 ) and ( $this->mLastSection === 'pre' or trim($t) != '' ) ) {
21212096 // pre
21222097 if ($this->mLastSection !== 'pre') {
21232098 $paragraphStack = false;
@@ -2145,7 +2120,7 @@
21462121 $output .= $paragraphStack;
21472122 $paragraphStack = false;
21482123 $this->mLastSection = 'p';
2149 - } elseif ($this->mLastSection !== 'p') {
 2124+ } else if ($this->mLastSection !== 'p') {
21502125 $output .= $this->closeParagraph().'<p>';
21512126 $this->mLastSection = 'p';
21522127 }
@@ -2166,7 +2141,7 @@
21672142 $output .= $this->closeList( $prefix2[$prefixLength-1] );
21682143 --$prefixLength;
21692144 }
2170 - if ( $this->mLastSection !== '' ) {
 2145+ if ( $this->mLastSection != '' ) {
21712146 $output .= '</' . $this->mLastSection . '>';
21722147 $this->mLastSection = '';
21732148 }
@@ -2972,7 +2947,7 @@
29732948 $isHTML = true;
29742949 $this->disableCache();
29752950 }
2976 - } elseif ( $wgNonincludableNamespaces && in_array( $title->getNamespace(), $wgNonincludableNamespaces ) ) {
 2951+ } else if ( $wgNonincludableNamespaces && in_array( $title->getNamespace(), $wgNonincludableNamespaces ) ) {
29772952 $found = false; //access denied
29782953 wfDebug( __METHOD__.": template inclusion denied for " . $title->getPrefixedDBkey() );
29792954 } else {
@@ -3585,7 +3560,7 @@
35863561 if (preg_match("/^$markerRegex/", $headline, $markerMatches)) {
35873562 $serial = $markerMatches[1];
35883563 list( $titleText, $sectionIndex ) = $this->mHeadings[$serial];
3589 - $isTemplate = ($titleText !== $baseTitleText);
 3564+ $isTemplate = ($titleText != $baseTitleText);
35903565 $headline = preg_replace("/^$markerRegex/", "", $headline);
35913566 }
35923567
@@ -3701,7 +3676,7 @@
37023677 if ( $legacyHeadline == $safeHeadline ) {
37033678 # No reason to have both (in fact, we can't)
37043679 $legacyHeadline = false;
3705 - } elseif ( $legacyHeadline !== Sanitizer::escapeId(
 3680+ } elseif ( $legacyHeadline != Sanitizer::escapeId(
37063681 $legacyHeadline, 'xml' ) ) {
37073682 # The legacy id is invalid XML. We used to allow this, but
37083683 # there's no reason to do so anymore. Backward
@@ -3875,8 +3850,8 @@
38763851 else
38773852 continue;
38783853 }
3879 - if ( $s['index'] !== $section ||
3880 - $s['fromtitle'] !== $titletext ) {
 3854+ if ( $s['index'] != $section ||
 3855+ $s['fromtitle'] != $titletext ) {
38813856 self::incrementNumbering( $numbering,
38823857 $s['toclevel'], $lastLevel );
38833858
@@ -3927,7 +3902,7 @@
39283903 private static function incrementNumbering( &$number, $level, $lastLevel ) {
39293904 if ( $level > $lastLevel )
39303905 $number[$level - 1] = 1;
3931 - elseif ( $level < $lastLevel ) {
 3906+ else if ( $level < $lastLevel ) {
39323907 foreach ( $number as $key => $unused )
39333908 if ( $key >= $level )
39343909 unset( $number[$key] );
@@ -4037,7 +4012,7 @@
40384013 $m = array();
40394014 if ( preg_match( "/^($nc+:|)$tc+?( \\($tc+\\))$/", $t, $m ) ) {
40404015 $text = preg_replace( $p2, "[[$m[1]\\1$m[2]|\\1]]", $text );
4041 - } elseif ( preg_match( "/^($nc+:|)$tc+?(, $tc+|)$/", $t, $m ) && "$m[1]$m[2]" !== '' ) {
 4016+ } elseif ( preg_match( "/^($nc+:|)$tc+?(, $tc+|)$/", $t, $m ) && "$m[1]$m[2]" != '' ) {
40424017 $text = preg_replace( $p2, "[[$m[1]\\1$m[2]|\\1]]", $text );
40434018 } else {
40444019 # if there's no context, don't bother duplicating the title
@@ -4876,7 +4851,7 @@
48774852 if ( $node->getName() === 'h' ) {
48784853 $bits = $node->splitHeading();
48794854 $curLevel = $bits['level'];
4880 - if ( $bits['i'] !== $sectionIndex && $curLevel <= $targetLevel ) {
 4855+ if ( $bits['i'] != $sectionIndex && $curLevel <= $targetLevel ) {
48814856 break;
48824857 }
48834858 }
@@ -4892,7 +4867,7 @@
48934868 // Add two newlines on -- trailing whitespace in $newText is conventionally
48944869 // stripped by the editor, so we need both newlines to restore the paragraph gap
48954870 // Only add trailing whitespace if there is newText
4896 - if($newText !== "") {
 4871+ if($newText != "") {
48974872 $outText .= $newText . "\n\n";
48984873 }
48994874
Index: trunk/phase3/includes/StringUtils.php
@@ -179,14 +179,6 @@
180180 return new ArrayIterator( explode( $separator, $subject ) );
181181 }
182182 }
183 -
184 - /**
185 - * Workalike for preg_split() with limited memory usage.
186 - * Returns an Iterator
187 - */
188 - static function preg_split( $pattern, $subject, $limit = -1, $flags = 0 ) {
189 - return new PregSplitIterator( $pattern, $subject, $limit, $flags );
190 - }
191183 }
192184
193185 /**
@@ -417,82 +409,3 @@
418410 }
419411 }
420412
421 -
422 -/**
423 - * An iterator which works exactly like:
424 - *
425 - * foreach ( preg_split( $pattern, $s, $limit, $flags ) as $element ) {
426 - * ...
427 - * }
428 - *
429 - * Except it doesn't use huge amounts of memory when $limit is -1
430 - *
431 - * The flag PREG_SPLIT_OFFSET_CAPTURE isn't supported.
432 - */
433 -class PregSplitIterator implements Iterator {
434 - // The subject string
435 - var $pattern, $subject, $originalLimit, $flags;
436 -
437 - // The last extracted group of items.
438 - var $smallArray;
439 -
440 - // The position on the iterator.
441 - var $curPos;
442 -
443 - const MAX_LIMIT = 100;
444 -
445 - /**
446 - * Construct a PregSplitIterator
447 - */
448 - function __construct( $pattern, $s, $limit, $flags) {
449 - $this->pattern = $pattern;
450 - $this->subject = $s;
451 - $this->originalLimit = $limit;
452 - $this->flags = $flags;
453 -
454 - $this->rewind();
455 - }
456 -
457 - private function effectiveLimit() {
458 - if ($this->originalLimit == -1) {
459 - return self::MAX_LIMIT + 1;
460 - } else if ($this->limit > self::MAX_LIMIT) {
461 - $this->limit -= self::MAX_LIMIT;
462 - return self::MAX_LIMIT + 1;
463 - } else {
464 - $old = $this->limit;
465 - $this->limit = 0;
466 - return $old;
467 - }
468 - }
469 -
470 - function rewind() {
471 - $this->curPos = 0;
472 - $this->limit = $this->originalLimit;
473 - if ($this->limit == -1) $this->limit = self::MAX_LIMIT;
474 - $this->smallArray = preg_split( $this->pattern, $this->subject, $this->effectiveLimit(), $this->flags);
475 - }
476 -
477 - function current() {
478 - return $this->smallArray[$this->curPos % self::MAX_LIMIT];
479 - }
480 -
481 - function key() {
482 - return $this->curPos;
483 - }
484 -
485 - function next() {
486 - $this->curPos++;
487 - if ( $this->curPos % self::MAX_LIMIT == 0 ) {
488 - # Last item contains the rest unsplitted.
489 - if ($this->limit > 0) {
490 - $this->smallArray = preg_split( $this->pattern, $this->smallArray[self::MAX_LIMIT], $this->effectiveLimit(), $this->flags);
491 - }
492 - }
493 - return;
494 - }
495 -
496 - function valid() {
497 - return $this->curPos % self::MAX_LIMIT < count($this->smallArray);
498 - }
499 -}
Index: trunk/phase3/RELEASE-NOTES
@@ -711,8 +711,6 @@
712712 * (bug 9794) User rights log entries for foreign user now links to the foreign
713713 user's page if possible
714714 * (bug 14717) Don't load nonexistent CSS fix files for non-Monobook skins
715 -* (bug 18765) Increased consistency of bold-italic markup for unbalanced quotes.
716 - Improved representation of six quotes (may break existing markup).
717715 * (bug 22034) Use wfClientAcceptsGzip() in wfGzipHandler instead of
718716 reimplementing it.
719717 * (bug 19226) First line renders differently on many UI messages.

Follow-up revisions

RevisionCommit summaryAuthorDate
r61554Merged in local changes, by reverse merging r61551 except without the unrelat...tstarling02:55, 27 January 2010

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r61052* (bug 18765) Increased consistency of bold-italic markup for unbalanced quotes....platonides16:18, 14 January 2010
r61053Fix UTF-8 broken on r61052platonides16:28, 14 January 2010
r61515Cosmetic changes from r61052 comments:...platonides11:57, 26 January 2010
r61519Parser documentation:...platonides16:16, 26 January 2010
r61525Step 1: Apply attachment 2 from bug 18765.platonides18:55, 26 January 2010
r61526Step 2: Take out countBoldAndItalic()platonides18:55, 26 January 2010
r61527Step 3: Balance the quotes directly on $text...platonides18:56, 26 January 2010
r61528Step 4: Profit!!...platonides18:58, 26 January 2010

Status & tagging log