r61052 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r61051‎ | r61052 | r61053 >
Date:16:18, 14 January 2010
Author:platonides
Status:reverted (Comments)
Tags:
Comment:
* (bug 18765) Increased consistency of bold-italic markup for unbalanced quotes.
The representation of six quotes is now improved (changes the meaning of some dubious markup).
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)
  • /trunk/phase3/maintenance/parserTests.txt (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/parserTests.txt
@@ -1635,8 +1635,8 @@
16361636 *[[Wikipedia:ro:Olteniţa]]
16371637 *[[Wikipedia:ro:Olteniţa]]
16381638 !! result
1639 -<ul><li><a href="http://en.wikipedia.org/wiki/ro:Olteni%C5%A3a" class="extiw" title="wikipedia:ro:Olteniţa">Wikipedia:ro:Olteni&#355;a</a>
1640 -</li><li><a href="http://en.wikipedia.org/wiki/ro:Olteni%C5%A3a" class="extiw" title="wikipedia:ro:Olteniţa">Wikipedia:ro:Olteni&#355;a</a>
 1639+<ul><li><a href="http://en.wikipedia.org/wiki/ro:Olteni%C5%A3a" class="extiw" title="wikipedia:ro:Oltenita">Wikipedia:ro:Olteni&#355;a</a>
 1640+</li><li><a href="http://en.wikipedia.org/wiki/ro:Olteni%C5%A3a" class="extiw" title="wikipedia:ro:Oltenita">Wikipedia:ro:Olteni&#355;a</a>
16411641 </li></ul>
16421642
16431643 !! end
@@ -3394,7 +3394,7 @@
33953395 !! input
33963396 [[Image:foobar.jpg|&#9792;]]
33973397 !! result
3398 -<p><a href="https://www.mediawiki.org/wiki/File:Foobar.jpg" class="image" title="♀"><img alt="♀" src="http://example.com/images/3/3a/Foobar.jpg" width="1941" height="220" /></a>
 3398+<p><a href="https://www.mediawiki.org/wiki/File:Foobar.jpg" class="image" title="?"><img alt="?" src="http://example.com/images/3/3a/Foobar.jpg" width="1941" height="220" /></a>
33993399 </p>
34003400 !! end
34013401
@@ -3537,7 +3537,7 @@
35383538 [[fr:Nourriture]]
35393539 [[zh:&#39135;&#21697;]]
35403540 !! result
3541 -es:Alimento fr:Nourriture zh:食品
 3541+es:Alimento fr:Nourriture zh:??
35423542 !! end
35433543
35443544 ###
@@ -5252,19 +5252,17 @@
52535253 </p>
52545254 !! end
52555255
5256 -# Original result was this:
5257 -# <p><b>bold</b><b>bold<i>bolditalics</i></b>
 5256+# This was the original html, but it has also been
 5257+# <p>'<i>bold'</i><b>bold<i>bolditalics</i></b>
52585258 # </p>
5259 -# While that might be marginally more intuitive, maybe, the six-apostrophe
5260 -# construct is clearly pathological and the result stated here (which is what
5261 -# the parser actually does) is about as reasonable as anything.
 5259+# See bug 18765.
52625260 !!test
52635261 Mixing markup for italics and bold
52645262 !! options
52655263 !! input
52665264 '''bold''''''bold''bolditalics'''''
52675265 !! result
5268 -<p>'<i>bold'</i><b>bold<i>bolditalics</i></b>
 5266+<p><b>bold</b><b>bold<i>bolditalics</i></b>
52695267 </p>
52705268 !! end
52715269
@@ -6746,9 +6744,9 @@
67476745 !! options
67486746 title=[[Dunav]] language=sr
67496747 !! input
6750 -Both [[Dunav]] and [[Дунав]] are names for this river.
 6748+Both [[Dunav]] and [[?????]] are names for this river.
67516749 !! result
6752 -<p>Both <strong class="selflink">Dunav</strong> and <strong class="selflink">Дунав</strong> are names for this river.
 6750+<p>Both <strong class="selflink">Dunav</strong> and <strong class="selflink">?????</strong> are names for this river.
67536751 </p>
67546752 !!end
67556753
@@ -6758,9 +6756,9 @@
67596757 !! options
67606758 language=sr
67616759 !! input
6762 -Main Page can be written as [[Маин Паге]]
 6760+Main Page can be written as [[???? ????]]
67636761 !! result
6764 -<p>Main Page can be written as <a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">Маин Паге</a>
 6762+<p>Main Page can be written as <a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">???? ????</a>
67656763 </p>
67666764 !!end
67676765
@@ -6770,9 +6768,9 @@
67716769 !! options
67726770 language=sr
67736771 !! input
6774 -[[Main Page]] can be written as [[Маин Паге]] same as [[Маин Паге]].
 6772+[[Main Page]] can be written as [[???? ????]] same as [[???? ????]].
67756773 !! result
6776 -<p><a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">Main Page</a> can be written as <a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">Маин Паге</a> same as <a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">Маин Паге</a>.
 6774+<p><a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">Main Page</a> can be written as <a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">???? ????</a> same as <a href="https://www.mediawiki.org/wiki/Main_Page" title="Main Page">???? ????</a>.
67776775 </p>
67786776 !!end
67796777
@@ -6782,7 +6780,7 @@
67836781 !! options
67846782 language=sr
67856783 !! input
6786 -{{тест}}
 6784+{{????}}
67876785 !! result
67886786 <p>This is a test template
67896787 </p>
@@ -6794,7 +6792,7 @@
67956793 !! options
67966794 language=sr
67976795 !! input
6798 -{{Template:тест}}
 6796+{{Template:????}}
67996797 !! result
68006798 <p>This is a test template
68016799 </p>
@@ -6806,7 +6804,7 @@
68076805 !! options
68086806 language=sr
68096807 !! input
6810 -{{парамтест|param=foo}}
 6808+{{?????????|param=foo}}
68116809 !! result
68126810 <p>This is a test template with parameter foo
68136811 </p>
@@ -6818,9 +6816,9 @@
68196817 !! options
68206818 language=sr cat
68216819 !! input
6822 -[[Category:МедиаWики Усер'с Гуиде]]
 6820+[[Category:?????W??? ????'? ?????]]
68236821 !! result
6824 -<a href="https://www.mediawiki.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%98%D0%B0:MediaWiki_User%27s_Guide" title="Категорија:MediaWiki User's Guide">MediaWiki User's Guide</a>
 6822+<a href="https://www.mediawiki.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%98%D0%B0:MediaWiki_User%27s_Guide" title="??????????:MediaWiki User's Guide">MediaWiki User's Guide</a>
68256823 !! end
68266824
68276825
@@ -6843,7 +6841,7 @@
68446842 !! input
68456843 Latinski: -{Ne nuntium necare}-
68466844 !! result
6847 -<p>Латински: Ne nuntium necare
 6845+<p>????????: Ne nuntium necare
68486846 </p>
68496847 !! end
68506848
@@ -6855,7 +6853,7 @@
68566854 !! input
68576855 Latinski: -{Ne nuntium necare}-
68586856 !! result
6859 -<p>Латински: Ne nuntium necare
 6857+<p>????????: Ne nuntium necare
68606858 </p>
68616859 !! end
68626860
@@ -6879,7 +6877,7 @@
68806878 !! input
68816879 == -{Naslov}- ==
68826880 !! result
6883 -<h2><span class="editsection">[<a href="https://www.mediawiki.org/index.php?title=Parser_test&amp;action=edit&amp;section=1" title="Уреди део: Naslov">уреди</a>]</span> <span class="mw-headline" id="-.7BNaslov.7D-"> Naslov </span></h2>
 6881+<h2><span class="editsection">[<a href="https://www.mediawiki.org/index.php?title=Parser_test&amp;action=edit&amp;section=1" title="????? ???: Naslov">?????</a>]</span> <span class="mw-headline" id="-.7BNaslov.7D-"> Naslov </span></h2>
68846882
68856883 !! end
68866884
@@ -6979,7 +6977,7 @@
69806978 !! input
69816979 Fridrih IV je car.
69826980 !! result
6983 -<p>Фридрих IV је цар.
 6981+<p>??????? IV ?? ???.
69846982 </p>
69856983 !! end
69866984
@@ -7056,23 +7054,23 @@
70577055 language=cs
70587056 title=[[Main Page]]
70597057 !! input
7060 -{{PRVNÍVELKÉ:ěščř}}
7061 -{{prvnívelké:ěščř}}
7062 -{{PRVNÍMALÉ:ěščř}}
7063 -{{prvnímalé:ěščř}}
7064 -{{MALÁ:ěščř}}
7065 -{{malá:ěščř}}
7066 -{{VELKÁ:ěščř}}
7067 -{{velká:ěščř}}
 7058+{{PRVNÍVELKÉ:ešcr}}
 7059+{{prvnívelké:ešcr}}
 7060+{{PRVNÍMALÉ:ešcr}}
 7061+{{prvnímalé:ešcr}}
 7062+{{MALÁ:ešcr}}
 7063+{{malá:ešcr}}
 7064+{{VELKÁ:ešcr}}
 7065+{{velká:ešcr}}
70687066 !! result
7069 -<p>Ěščř
7070 -Ěščř
7071 -ěščř
7072 -ěščř
7073 -ěščř
7074 -ěščř
7075 -ĚŠČŘ
7076 -ĚŠČŘ
 7067+<p>Ešcr
 7068+Ešcr
 7069+ešcr
 7070+ešcr
 7071+ešcr
 7072+ešcr
 7073+EŠCR
 7074+EŠCR
70777075 </p>
70787076 !! end
70797077
@@ -7339,7 +7337,7 @@
73407338 !! input
73417339 [http://en.wikipedia.org/]
73427340 !! result
7343 -<p><a href="http://en.wikipedia.org/" class="external autonumber" rel="nofollow">[۱]</a>
 7341+<p><a href="http://en.wikipedia.org/" class="external autonumber" rel="nofollow">[?]</a>
73447342 </p>
73457343 !! end
73467344
@@ -7466,7 +7464,7 @@
74677465 !! input
74687466 /* External links */ removed bogus entries
74697467 !! result
7470 -<span class="autocomment"><a href="https://www.mediawiki.org/wiki/Main_Page#External_links" title="Main Page">→</a>External links: </span> removed bogus entries
 7468+<span class="autocomment"><a href="https://www.mediawiki.org/wiki/Main_Page#External_links" title="Main Page">?</a>External links: </span> removed bogus entries
74717469 !!end
74727470
74737471 !! test
@@ -7476,7 +7474,7 @@
74777475 !! input
74787476 /* External links */ removed bogus entries
74797477 !! result
7480 -<span class="autocomment"><a href="#External_links">→</a>External links: </span> removed bogus entries
 7478+<span class="autocomment"><a href="#External_links">?</a>External links: </span> removed bogus entries
74817479 !!end
74827480
74837481 !! test
@@ -7537,10 +7535,86 @@
75387536 <a href="https://www.mediawiki.org/wiki/Main_Page#section" title="Main Page">#section</a>
75397537 !! end
75407538
 7539+!! test
 7540+Bold/italic markup handled differently depending on leading whitespace (bug 18765)
 7541+!!input
 7542+'''Look at ''this edit'''s complicated bold/italic markup!'''
 7543+
 7544+<!-- Comment -->'''Look at ''this edit'''s complicated bold/italic markup!'''
 7545+
 7546+<span> '''Look at ''this edit'''s complicated bold/italic markup!'''</span>
 7547+
 7548+<nowiki></nowiki> '''Look at ''this edit'''s complicated bold/italic markup!'''
 7549+
 7550+<!-- Hello world---> '''Look at ''this edit'''s complicated bold/italic markup!'''
 7551+
 7552+{|
 7553+| '''Look at ''this edit'''s complicated bold/italic markup!'''
 7554+|}
 7555+
 7556+'''This was Italic'' this was plain''' and this was bold'''
 7557+but '''This is bold'' this is bold italic''' and this is bold'''
 7558+
 7559+<!-- Wishlist: Breaking because <span> and | are treated as text
 7560+<span>'''Look at ''this edit'''s complicated bold/italic markup!'''</span>
 7561+{|
 7562+|'''Look at ''this edit'''s complicated bold/italic markup!'''
 7563+|}
 7564+-->
 7565+!! result
 7566+<p><b>Look at <i>this edit'</i>s complicated bold/italic markup!</b>
 7567+</p><p><b>Look at <i>this edit'</i>s complicated bold/italic markup!</b>
 7568+</p><p><span> <b>Look at <i>this edit'</i>s complicated bold/italic markup!</b></span>
 7569+</p><p> <b>Look at <i>this edit'</i>s complicated bold/italic markup!</b>
 7570+</p>
 7571+<pre><b>Look at <i>this edit'</i>s complicated bold/italic markup!</b>
 7572+</pre>
 7573+<table>
 7574+<tr>
 7575+<td> <b>Look at <i>this edit'</i>s complicated bold/italic markup!</b>
 7576+</td></tr></table>
 7577+<p><b>This was Italic<i> this was plain'</i> and this was bold</b>
 7578+but <b>This is bold<i> this is bold italic'</i> and this is bold</b>
 7579+</p><p><br />
 7580+</p>
 7581+!! end
 7582+
 7583+!! test
 7584+Six quotes
 7585+!!input
 7586+''Italic''''''Bold
 7587+
 7588+'''Bold''BoldItalic''''''Normal
 7589+
 7590+''Italic'''BoldItalic''''''Normal'''''
 7591+
 7592+'''''BoldItalic''''''MoreBoldItalic''
 7593+
 7594+''''''Normal
 7595+!!result
 7596+<p><i>Italic'</i><b>Bold</b>
 7597+</p><p><b>Bold<i>BoldItalic'</i></b>Normal
 7598+</p><p><i>Italic<b>BoldItalic'</b></i>Normal
 7599+</p><p><i><b>BoldItalic</b><b>MoreBoldItalic</b></i>
 7600+</p><p>Normal
 7601+</p>
 7602+!!end
 7603+
 7604+
 7605+!! test
 7606+Too many quotes
 7607+!!input
 7608+I '''like'''''quotes'''''''''''
 7609+!! result
 7610+<p>I <b>like</b><i>quotes''''''</i><b> </b>
 7611+</p>
 7612+!! end
 7613+
 7614+
 7615+
 7616+
75417617 TODO:
75427618 more images
75437619 more tables
75447620 math
7545 -character entities
7546 -and much more
7547 -Try for 100% code coverage
 7621+char
\ No newline at end of file
Index: trunk/phase3/includes/parser/Parser.php
@@ -1114,6 +1114,7 @@
11151115 }
11161116
11171117 /**
 1118+ * Processes bolds and italics on a single line.
11181119 * Helper function for doAllQuotes()
11191120 */
11201121 public function doQuotes( $text ) {
@@ -1139,18 +1140,19 @@
11401141 $arr[$i-1] .= "'";
11411142 $arr[$i] = "'''";
11421143 }
1143 - # If there are more than 5 apostrophes in a row, assume they're all
1144 - # text except for the last 5.
1145 - else if ( strlen( $arr[$i] ) > 5 )
 1144+ # If there are more than 6 apostrophes in a row, assume they're all
 1145+ # text except for the last 6.
 1146+ else if ( strlen( $arr[$i] ) > 6 )
11461147 {
1147 - $arr[$i-1] .= str_repeat( "'", strlen( $arr[$i] ) - 5 );
1148 - $arr[$i] = "'''''";
 1148+ $arr[$i-1] .= str_repeat( "'", strlen( $arr[$i] ) - 6 );
 1149+ $arr[$i] = "''''''";
11491150 }
11501151 # Count the number of occurrences of bold and italics mark-ups.
11511152 # We are not counting sequences of five apostrophes.
11521153 if ( strlen( $arr[$i] ) == 2 ) { $numitalics++; }
11531154 else if ( strlen( $arr[$i] ) == 3 ) { $numbold++; }
11541155 else if ( strlen( $arr[$i] ) == 5 ) { $numitalics++; $numbold++; }
 1156+ else if ( strlen( $arr[$i] ) == 6 ) { $numbold+=2; }
11551157 }
11561158 $i++;
11571159 }
@@ -1162,11 +1164,17 @@
11631165 if ( ( $numbold % 2 == 1 ) && ( $numitalics % 2 == 1 ) )
11641166 {
11651167 $i = 0;
 1168+
 1169+ #These are indexes to the /next/ array entry than the
 1170+ #one holding the text matching the condition.
11661171 $firstsingleletterword = -1;
11671172 $firstmultiletterword = -1;
11681173 $firstspace = -1;
 1174+
11691175 foreach ( $arr as $r )
11701176 {
 1177+ #Filter the "'''". Separators are on odd positions.
 1178+ #$arr[0] will be an empty string if needed.
11711179 if ( ( $i % 2 == 1 ) and ( strlen( $r ) == 3 ) )
11721180 {
11731181 $x1 = substr ($arr[$i-1], -1);
@@ -1175,7 +1183,7 @@
11761184 if ($firstspace == -1) $firstspace = $i;
11771185 } else if ($x2 === ' ') {
11781186 if ($firstsingleletterword == -1) $firstsingleletterword = $i;
1179 - } else {
 1187+ } else if ($arr[$i-1] != "") {
11801188 if ($firstmultiletterword == -1) $firstmultiletterword = $i;
11811189 }
11821190 }
@@ -1205,9 +1213,9 @@
12061214 }
12071215
12081216 # Now let's actually convert our apostrophic mush to HTML!
1209 - $output = '';
1210 - $buffer = '';
1211 - $state = '';
 1217+ $output = ''; #Processed text
 1218+ $buffer = ''; #Content if $state is 'both'
 1219+ $state = ''; #Flags with the order of open tags: '|b|i|bi|ib|both'
12121220 $i = 0;
12131221 foreach ($arr as $r)
12141222 {
@@ -1261,6 +1269,21 @@
12621270 else # ($state == '')
12631271 { $buffer = ''; $state = 'both'; }
12641272 }
 1273+ else if (strlen ($r) == 6)
 1274+ {
 1275+ if ($state === 'b')
 1276+ { $output .= '</b><b>'; $state = 'b'; }
 1277+ else if ($state === 'i')
 1278+ { $output .= '\'</i><b>'; $state = 'b'; }
 1279+ else if ($state === 'bi')
 1280+ { $output .= '\'</i></b>'; $state = ''; }
 1281+ else if ($state === 'ib')
 1282+ { $output .= '\'</b></i>'; $state = ''; }
 1283+ else if ($state === 'both')
 1284+ { $output .= '<i><b>'.$buffer.'</b><b>'; $state = 'ib'; }
 1285+ else # ($state == '')
 1286+ { $buffer = ''; $state = ''; }
 1287+ }
12651288 }
12661289 $i++;
12671290 }
Index: trunk/phase3/RELEASE-NOTES
@@ -704,6 +704,8 @@
705705 * (bug 9794) User rights log entries for foreign user now links to the foreign
706706 user's page if possible
707707 * (bug 14717) Don't load nonexistent CSS fix files for non-Monobook skins
 708+* (bug 18765) Increased consistency of bold-italic markup for unbalanced quotes.
 709+ Improved representation of six quotes (may break existing markup).
708710
709711 == API changes in 1.16 ==
710712

Follow-up revisions

RevisionCommit summaryAuthorDate
r61053Fix UTF-8 broken on r61052platonides16:28, 14 January 2010
r61055(bug 19226) First line renders differently on many UI messages...platonides17:14, 14 January 2010
r61515Cosmetic changes from r61052 comments:...platonides11:57, 26 January 2010
r61525Step 1: Apply attachment 2 from bug 18765.platonides18:55, 26 January 2010
r61551Revert r61528, r61527, r61526, r61525, r61519, r61515, r61053, r61052 (Parser...tstarling02:41, 27 January 2010
r61613Make six quotes the first place to look when looking for a quote to make lite...platonides23:05, 27 January 2010

Comments

#Comment by MaxSem (talk | contribs)   16:21, 14 January 2010

Borked UTF-8

#Comment by Platonides (talk | contribs)   16:29, 14 January 2010

Fixed on r61052

#Comment by Nikerabbit (talk | contribs)   17:50, 14 January 2010

Coding style:

  • There should be space after #-symbol.

I'm ignoring the brace on new line issue since the file is using the other style for now.

#Comment by Happy-melon (talk | contribs)   20:49, 14 January 2010

And don't we use "elseif" rather than "else if"?? Reminds me of the dash polic on enwiki though...

#Comment by Platonides (talk | contribs)   22:23, 14 January 2010

Nikerabbit, noted.

Happy-melon: That section uses "else if"s. Parser.php seems quite confused: $ grep -c "elseif" Parser.php 35

$ grep -c "else if" Parser.php 36


#Comment by Happy-melon (talk | contribs)   22:34, 14 January 2010

Lol. It does have a throwaway mention in Coding conventions, though. Someone should do a find-and-replace throughout...

#Comment by Tim Starling (talk | contribs)   04:44, 26 January 2010

The coding style does indeed fit in nicely with r4549. However, it doesn't fit in with the rest of the file or with modern MW coding conventions, or even with basic rules of good taste. So I would like to see it cleaned up. I'm talking variable names, indenting, braces, comments, etc.

} else if ($arr[$i-1] != "") {

Suggest !==. Reasons on Manual:Coding conventions.

The r4549 algorithm in general has a serious problem with worst-case memory usage, due to the use of preg_split(). For a megabyte of double quotes, I measure a memory usage of 130MB. I'd like to see it replaced by string traversal with strpos/strspn, with the "preliminary" phase converted into a preprocessor or incorporated into the second phase.

I'm not sure what the point of outputting </b><b> is. Tidy will probably remove it. Why don't you just leave it out?

Otherwise seems OK.

#Comment by Platonides (talk | contribs)   11:35, 26 January 2010

I had left just to keep it clearer, since that's what they are requesting. Six quotes in that case doesn't make much sense to begin with.

I do not see those numbers for the preg_split. The 1MB of double quotes takes 2MB and a bit over 2MB are taken by preg_split. This is measuring just the preg_split call on 32 bits with memory_get_usage([true]). 32 bits vs 64 doesn't make a difference either. They just change the base size. The reported usage is much higher in 5.3 than 5.2 but that's probably due to the "Improve memory_get_usage() accuracy".

The alternate implementation of http://bug-attachment.wikimedia.org/attachment.cgi?id=6596 seems to be slightly more memory friendly. It may make sense to use it, at least where APC is available.

#Comment by Platonides (talk | contribs)   11:37, 26 January 2010

Why is the bold spanning comments?

Imagine </b><b> at the point where the bold begins...

#Comment by Tim Starling (talk | contribs)   14:31, 26 January 2010

That's pretty amazing. Maybe you have a special version of preg_split() which stores the string offsets without using any additional memory. I heard that PCRE was developing some special technology for using the normally-hidden 9th bit in every byte.

In my more ordinary PHP:

$ php eval.php
> $i = intval(1048576/3); $text = str_repeat("'' ", $i);

> $m = memory_get_usage();

> $arr = preg_split( "/(''+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE );

> print memory_get_usage()-$m;
138421072

400 bytes per match.

#Comment by Platonides (talk | contribs)   18:59, 26 January 2010

Look at r61525-r61528 when you wake up, Tim :)

#Comment by P.Copp (talk | contribs)   22:19, 26 January 2010
...
+'''Bold''BoldItalic''''''Normal
+
+''Italic'''BoldItalic''''''Normal'''''
...

The second one looks odd to me; why the five quotes at the end? If you leave them away, you get the following test case

''Italic'''BoldItalic''''''Normal

that now results in

Italic'BoldItalicNormal

which is at least inconsistent with the case above (and also looks kind of counter-intuitive to me).

I think the reason for this behavior is that six quotes are now handled differently in the two passes. In the first pass they are counted as two bold markups, while in the second pass they are sometimes treated like a single quote followed by five quotes.

#Comment by Platonides (talk | contribs)   13:05, 27 January 2010

The five quotes at the end are for balancing.


''Italic'''BoldItalic''''''Normal'''''

would be equivalent to writing:

<i>Italic<b>BoldItalic</b><b>Normal</b></i>


Currently (pre-parser changes) it renders:

<i>Italic'</i>BoldItalic'<i><b>Normal</b></i>

which I find much more unlikely to be the desired result.



''Italic'''BoldItalic''''''Normal

is not balanced.


We have to guess what wanted the user. Old guess is

<i>Italic<b>BoldItalic'</b></i>Normal

and the new one

<i>Italic'</i>BoldItalic<b></b>Normal

I can easily add a rule to prefer the six quotes when choosing the ' to make part of the text.

<i>Italic<b>BoldItalic'</b></i>Normal

Is that what you are requesting?

#Comment by P.Copp (talk | contribs)   15:21, 27 January 2010

I'm referring to the difference the new behavior introduced for the two cases

'''Bold''BoldItalic''''''Normal
''Italic'''BoldItalic''''''Normal

The current parser renders them as

<b>Bold<i>BoldItalic'</i></b>Normal
<i>Italic<b>BoldItalic'</b></i>Normal

while the new version would output

<b>Bold<i>BoldItalic'</i></b>Normal
<i>Italic'</i>BoldItalicNormal

If these cases are balanced or not, depends on the implementation. After all, I don't really see why treating six quotes as <b></b> should be more reasonable than treating them as '</b></i>. Suppose someone writes

''''''This is very important!''''''

The new version would just remove all quotes entirely, which could be quite surprising to the user :)

#Comment by Platonides (talk | contribs)   20:46, 27 January 2010

That's the issue, giving a reasonable solution. But whatever we choose, it will break something.

'''bold''''''bold''bolditalics'''''

rendered as

<b>bold</b><b>bold<i>bolditalics</i></b>

seems more reasonable to me than

'<i>bold'</i><b>bold<i>bolditalics</i></b>

Looking at six quote usage on enwiki, there are instances which are just typos [1] or vandalism.

Some cases wanted a <b></b>, also some templates with '''{{{1|}}}'''.

On others they wanted a quoted bold italic text, like Algol Expressions or Remnants. However, the output isn't right either since one quote is bold italic and the other isn't. It should either affect both or none. There are also instances were they wanted a Item's, which wouldn't be affected by this change (although the ' is put inside the formatting and should be outside).

We could make six quotes when they are around a word to make it bold italic placing extra ' in the inside, but i'm reluctant to such changes.


PS: Crappy links on this comment should show correctly when r61581 goes live.

#Comment by P.Copp (talk | contribs)   21:22, 27 January 2010

Yes, I guess most of the instances are typos and such, however such mistakes are easier to spot if we output _something_, instead of just silently removing the quotes.

Whatever heuristic you choose, it should IMHO be consistent, in a way that the two cases I mentioned in my previous post work symmetrically. Moreover, the following should work the same:

'''Bold''BoldItalic''''''Normal
Foo. '''Bold''BoldItalic''''''Normal

But with the new version we get

<b>Bold<i>BoldItalic'</i></b>Normal
Foo. '<i>Bold</i>BoldItalicNormal
#Comment by Platonides (talk | contribs)   23:11, 27 January 2010

I have made on r61613 six quotes the first candidate when we have a quote too much.

This takes care of given samples. For this last one, output will be:

<b>Bold<i>BoldItalic'</i></b>Normal
Foo. <b>Bold<i>BoldItalic'</i></b>Normal

There will likely be differences for more complex lines though.

Status & tagging log