r80430 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r80429‎ | r80430 | r80431 >
Date:23:57, 16 January 2011
Author:happy-melon
Status:reverted (Comments)
Tags:
Comment:
(bug 529, bug 12974) alter behaviour of the table- and template-parser:
* Remove the hack from bug 529 which inserts a newline when the template text begins with any block character; this breaks many things in many exciting ways. I've left it in for now when the text begins with a wikitable, as that markup can't be mistaken for anything else.
* Instead, move the contents of a table cell onto a new line for parsing, so that linestart elements (including nested tables) will parse as normal.

This means that structures like

{|
| {{template-containing-wikilist}}
|}

Will still work, but for the right reason, and structures like

{|
| style="color:{{template-containing-hexcode}}" | Foo
| * Bar
| {|
| Look at me, I'm nested!
|}
|}

Will all now start to work. Structures like

* Foo {{template-containing-wikilist}}

Will now not, but honestly, should they?
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/parser/Parser.php (modified) (history)
  • /trunk/phase3/tests/parser/parserTests.txt (modified) (history)

Diff [purge]

Index: trunk/phase3/tests/parser/parserTests.txt
@@ -1210,7 +1210,8 @@
12111211 |}
12121212 !! result
12131213 <table>
1214 -<caption> caption
 1214+<caption>
 1215+caption
12151216 </caption><tr><td></td></tr></table>
12161217
12171218 !! end
@@ -1226,12 +1227,22 @@
12271228 !! result
12281229 <table>
12291230 <tr>
1230 -<td> 1 </td>
1231 -<td> 2
 1231+<td>
 1232+<p>1
 1233+</p>
 1234+</td>
 1235+<td>
 1236+<p>2
 1237+</p>
12321238 </td></tr>
12331239 <tr>
1234 -<td> 3 </td>
1235 -<td> 4
 1240+<td>
 1241+<p>3
 1242+</p>
 1243+</td>
 1244+<td>
 1245+<p>4
 1246+</p>
12361247 </td></tr></table>
12371248
12381249 !! end
@@ -1261,48 +1272,110 @@
12621273 |}
12631274 !! result
12641275 <table border="1" cellpadding="2">
1265 -<caption>Multiplication table
 1276+<caption>
 1277+Multiplication table
12661278 </caption>
12671279 <tr>
1268 -<th> &times; </th>
1269 -<th> 1 </th>
1270 -<th> 2 </th>
1271 -<th> 3
 1280+<th>
 1281+<p>&times;
 1282+</p>
 1283+</th>
 1284+<th>
 1285+<p>1
 1286+</p>
 1287+</th>
 1288+<th>
 1289+<p>2
 1290+</p>
 1291+</th>
 1292+<th>
 1293+<p>3
 1294+</p>
12721295 </th></tr>
12731296 <tr>
1274 -<th> 1
 1297+<th>
 1298+<p>1
 1299+</p>
12751300 </th>
1276 -<td> 1 </td>
1277 -<td> 2 </td>
1278 -<td> 3
 1301+<td>
 1302+<p>1
 1303+</p>
 1304+</td>
 1305+<td>
 1306+<p>2
 1307+</p>
 1308+</td>
 1309+<td>
 1310+<p>3
 1311+</p>
12791312 </td></tr>
12801313 <tr>
1281 -<th> 2
 1314+<th>
 1315+<p>2
 1316+</p>
12821317 </th>
1283 -<td> 2 </td>
1284 -<td> 4 </td>
1285 -<td> 6
 1318+<td>
 1319+<p>2
 1320+</p>
 1321+</td>
 1322+<td>
 1323+<p>4
 1324+</p>
 1325+</td>
 1326+<td>
 1327+<p>6
 1328+</p>
12861329 </td></tr>
12871330 <tr>
1288 -<th> 3
 1331+<th>
 1332+<p>3
 1333+</p>
12891334 </th>
1290 -<td> 3 </td>
1291 -<td> 6 </td>
1292 -<td> 9
 1335+<td>
 1336+<p>3
 1337+</p>
 1338+</td>
 1339+<td>
 1340+<p>6
 1341+</p>
 1342+</td>
 1343+<td>
 1344+<p>9
 1345+</p>
12931346 </td></tr>
12941347 <tr>
1295 -<th> 4
 1348+<th>
 1349+<p>4
 1350+</p>
12961351 </th>
1297 -<td> 4 </td>
1298 -<td> 8 </td>
1299 -<td> 12
 1352+<td>
 1353+<p>4
 1354+</p>
 1355+</td>
 1356+<td>
 1357+<p>8
 1358+</p>
 1359+</td>
 1360+<td>
 1361+<p>12
 1362+</p>
13001363 </td></tr>
13011364 <tr>
1302 -<th> 5
 1365+<th>
 1366+<p>5
 1367+</p>
13031368 </th>
1304 -<td> 5 </td>
1305 -<td> 10 </td>
1306 -<td> 15
 1369+<td>
 1370+<p>5
 1371+</p>
 1372+</td>
 1373+<td>
 1374+<p>10
 1375+</p>
 1376+</td>
 1377+<td>
 1378+<p>15
 1379+</p>
13071380 </td></tr></table>
13081381
13091382 !! end
@@ -1321,16 +1394,26 @@
13221395 !! result
13231396 <table align="right" border="1">
13241397 <tr>
1325 -<td> Cell 1, row 1
 1398+<td>
 1399+<p>Cell 1, row 1
 1400+</p>
13261401 </td>
1327 -<td rowspan="2"> Cell 2, row 1 (and 2)
 1402+<td rowspan="2">
 1403+<p>Cell 2, row 1 (and 2)
 1404+</p>
13281405 </td>
1329 -<td> Cell 3, row 1
 1406+<td>
 1407+<p>Cell 3, row 1
 1408+</p>
13301409 </td></tr>
13311410 <tr>
1332 -<td> Cell 1, row 2
 1411+<td>
 1412+<p>Cell 1, row 2
 1413+</p>
13331414 </td>
1334 -<td> Cell 3, row 2
 1415+<td>
 1416+<p>Cell 3, row 2
 1417+</p>
13351418 </td></tr></table>
13361419
13371420 !! end
@@ -1351,18 +1434,26 @@
13521435 !! result
13531436 <table border="1">
13541437 <tr>
1355 -<td> &alpha;
 1438+<td>
 1439+<p>&alpha;
 1440+</p>
13561441 </td>
13571442 <td>
13581443 <table bgcolor="#ABCDEF" border="2">
13591444 <tr>
1360 -<td>nested
 1445+<td>
 1446+<p>nested
 1447+</p>
13611448 </td></tr>
13621449 <tr>
1363 -<td>table
 1450+<td>
 1451+<p>table
 1452+</p>
13641453 </td></tr></table>
13651454 </td>
1366 -<td>the original table again
 1455+<td>
 1456+<p>the original table again
 1457+</p>
13671458 </td></tr></table>
13681459
13691460 !! end
@@ -1376,7 +1467,9 @@
13771468 !! result
13781469 <table>
13791470 <tr>
1380 -<td>broken
 1471+<td>
 1472+<p>broken
 1473+</p>
13811474 </td></tr></table>
13821475
13831476 !! end
@@ -1390,9 +1483,14 @@
13911484 !! result
13921485 <table>
13931486 <tr>
1394 -<td>[<a href="ftp://%7Cx" class="external free" rel="nofollow">ftp://%7Cx</a></td>
1395 -<td>]" onmouseover="alert(document.cookie)"&gt;test
 1487+<td>
 1488+<p>[<a href="ftp://%7Cx" class="external free" rel="nofollow">ftp://%7Cx</a>
 1489+</p>
13961490 </td>
 1491+<td>
 1492+<p>]" onmouseover="alert(document.cookie)"&gt;test
 1493+</p>
 1494+</td>
13971495 </tr>
13981496 </table>
13991497
@@ -2571,7 +2669,9 @@
25722670 !! result
25732671 <table>
25742672 <tr>
2575 -<td>[[{{{1}}}|{{{2}}}]]
 2673+<td>
 2674+<p>[[{{{1}}}|{{{2}}}]]
 2675+</p>
25762676 </td></tr></table>
25772677
25782678 !! end
@@ -2681,12 +2781,22 @@
26822782 </p>
26832783 <table>
26842784 <tr>
2685 -<td> 1 </td>
2686 -<td> 2
 2785+<td>
 2786+<p>1
 2787+</p>
 2788+</td>
 2789+<td>
 2790+<p>2
 2791+</p>
26872792 </td></tr>
26882793 <tr>
2689 -<td> 3 </td>
2690 -<td> 4
 2794+<td>
 2795+<p>3
 2796+</p>
 2797+</td>
 2798+<td>
 2799+<p>4
 2800+</p>
26912801 </td></tr></table>
26922802
26932803 !! end
@@ -2701,12 +2811,22 @@
27022812 </p>
27032813 <table>
27042814 <tr>
2705 -<td> 1 </td>
2706 -<td> 2
 2815+<td>
 2816+<p>1
 2817+</p>
 2818+</td>
 2819+<td>
 2820+<p>2
 2821+</p>
27072822 </td></tr>
27082823 <tr>
2709 -<td> 3 </td>
2710 -<td> 4
 2824+<td>
 2825+<p>3
 2826+</p>
 2827+</td>
 2828+<td>
 2829+<p>4
 2830+</p>
27112831 </td></tr></table>
27122832
27132833 !! end
@@ -4287,7 +4407,9 @@
42884408 !! result
42894409 <table>
42904410 <tr>
4291 -<th class="awesome"> status
 4411+<th class="awesome">
 4412+<p>status
 4413+</p>
42924414 </th></tr></table>
42934415
42944416 !!end
@@ -4741,7 +4863,9 @@
47424864 !! result
47434865 <table>
47444866 <tr>
4745 -<th style="color:blue"> status
 4867+<th style="color:blue">
 4868+<p>status
 4869+</p>
47464870 </th></tr></table>
47474871
47484872 !!end
@@ -4755,7 +4879,9 @@
47564880 !! result
47574881 <table>
47584882 <tr>
4759 -<th style="/* insecure input */"> status
 4883+<th style="/* insecure input */">
 4884+<p>status
 4885+</p>
47604886 </th></tr></table>
47614887
47624888 !! end
@@ -5424,10 +5550,15 @@
54255551 !! result
54265552 <table>
54275553 <tr>
5428 -<th>https://</th>
5429 -<th></th>
5430 -<th></th>
54315554 <th>
 5555+<p>https://
 5556+</p>
 5557+</th>
 5558+<th>
 5559+</th>
 5560+<th>
 5561+</th>
 5562+<th>
54325563 </td>
54335564 </tr>
54345565 </table>
@@ -5443,7 +5574,9 @@
54445575 !! result
54455576 <table>
54465577 <tr>
5447 -<th> <a href="irc://{{ftp://a" class="external free" rel="nofollow">irc://{{ftp://a</a>" onmouseover="alert('hello world');"
 5578+<th>
 5579+<p><a href="irc://{{ftp://a" class="external free" rel="nofollow">irc://{{ftp://a</a>" onmouseover="alert('hello world');"
 5580+</p>
54485581 </th>
54495582 <td>
54505583 </td>
@@ -5489,7 +5622,9 @@
54905623
54915624 MOVE YOUR MOUSE CURSOR OVER THIS TEXT
54925625 <tr>
5493 -<td></u>
 5626+<td>
 5627+<p></u>
 5628+</p>
54945629 </td>
54955630 </tr>
54965631 </table>
@@ -7604,18 +7739,17 @@
76057740 !!endarticle
76067741
76077742 !! test
7608 -Bug 529: Uncovered bullet
 7743+Bug 529/12974: Uncovered bullet
76097744 !! input
76107745 * Foo {{bullet}}
76117746 !! result
7612 -<ul><li> Foo
7613 -</li><li> Bar
 7747+<ul><li> Foo * Bar
76147748 </li></ul>
76157749
76167750 !! end
76177751
76187752 !! test
7619 -Bug 529: Uncovered table already at line-start
 7753+Bug 529/12974: Uncovered table already at line-start
76207754 !! input
76217755 x
76227756
@@ -7626,24 +7760,33 @@
76277761 </p>
76287762 <table>
76297763 <tr>
7630 -<td> 1 </td>
7631 -<td> 2
 7764+<td>
 7765+<p>1
 7766+</p>
 7767+</td>
 7768+<td>
 7769+<p>2
 7770+</p>
76327771 </td></tr>
76337772 <tr>
7634 -<td> 3 </td>
7635 -<td> 4
 7773+<td>
 7774+<p>3
 7775+</p>
 7776+</td>
 7777+<td>
 7778+<p>4
 7779+</p>
76367780 </td></tr></table>
76377781 <p>y
76387782 </p>
76397783 !! end
76407784
76417785 !! test
7642 -Bug 529: Uncovered bullet in parser function result
 7786+Bug 529/12974: Uncovered bullet in parser function result
76437787 !! input
76447788 * Foo {{lc:{{bullet}} }}
76457789 !! result
7646 -<ul><li> Foo
7647 -</li><li> bar
 7790+<ul><li> Foo * bar
76487791 </li></ul>
76497792
76507793 !! end
Index: trunk/phase3/includes/parser/Parser.php
@@ -825,7 +825,22 @@
826826 $has_opened_tr = array(); # Did this table open a <tr> element?
827827 $indent_level = 0; # indent level of the table
828828
829 - foreach ( $lines as $outLine ) {
 829+ # Keep pulling lines off the front of the array until they're all gone.
 830+ # we want to be able to push lines back on to the front of the stream,
 831+ # but StringUtils::explode() returns funky optimised Iterators which don't
 832+ # support insertion. So maintain a separate buffer and draw on that first if
 833+ # there's anything in it
 834+ $extraLines = array();
 835+ $lines->rewind();
 836+ do {
 837+ if( $extraLines ){
 838+ $outLine = array_shift( $extraLines );
 839+ } elseif( $lines->valid() ) {
 840+ $outLine = $lines->current();
 841+ $lines->next();
 842+ } else {
 843+ break;
 844+ }
830845 $line = trim( $outLine );
831846
832847 if ( $line === '' ) { # empty line, go to next line
@@ -901,11 +916,10 @@
902917 } elseif ( $first_character === '|' || $first_character === '!' || substr( $line , 0 , 2 ) === '|+' ) {
903918 # This might be cell elements, td, th or captions
904919 if ( substr( $line , 0 , 2 ) === '|+' ) {
905 - $first_character = '+';
906 - $line = substr( $line , 1 );
 920+ $first_character = '|+';
907921 }
908922
909 - $line = substr( $line , 1 );
 923+ $line = substr( $line , strlen( $first_character ) );
910924
911925 if ( $first_character === '!' ) {
912926 $line = str_replace( '!!' , '||' , $line );
@@ -916,62 +930,84 @@
917931 # by earlier parser steps, but should avoid splitting up eg
918932 # attribute values containing literal "||".
919933 $cells = StringUtils::explodeMarkup( '||' , $line );
 934+ $cell = array_shift( $cells );
920935
 936+ # Inject cells back into the stream to be dealt with later
 937+ # TODO: really we should do the whole thing as a stream...
 938+ # but that would be too much like a sensible implementation :P
 939+ if( count( $cells ) ){
 940+ foreach( array_reverse( $cells ) as $extraCell ){
 941+ array_unshift( $extraLines, $first_character . $extraCell );
 942+ }
 943+ }
 944+
921945 $outLine = '';
922946
923 - # Loop through each table cell
924 - foreach ( $cells as $cell ) {
925 - $previous = '';
926 - if ( $first_character !== '+' ) {
927 - $tr_after = array_pop( $tr_attributes );
928 - if ( !array_pop( $tr_history ) ) {
929 - $previous = "<tr{$tr_after}>\n";
930 - }
931 - array_push( $tr_history , true );
932 - array_push( $tr_attributes , '' );
933 - array_pop( $has_opened_tr );
934 - array_push( $has_opened_tr , true );
 947+ $previous = '';
 948+ if ( $first_character !== '|+' ) {
 949+ $tr_after = array_pop( $tr_attributes );
 950+ if ( !array_pop( $tr_history ) ) {
 951+ $previous = "<tr{$tr_after}>\n";
935952 }
 953+ array_push( $tr_history , true );
 954+ array_push( $tr_attributes , '' );
 955+ array_pop( $has_opened_tr );
 956+ array_push( $has_opened_tr , true );
 957+ }
936958
937 - $last_tag = array_pop( $last_tag_history );
 959+ $last_tag = array_pop( $last_tag_history );
938960
939 - if ( array_pop( $td_history ) ) {
940 - $previous = "</{$last_tag}>\n{$previous}";
941 - }
 961+ if ( array_pop( $td_history ) ) {
 962+ $previous = "</{$last_tag}>\n{$previous}";
 963+ }
942964
943 - if ( $first_character === '|' ) {
944 - $last_tag = 'td';
945 - } elseif ( $first_character === '!' ) {
946 - $last_tag = 'th';
947 - } elseif ( $first_character === '+' ) {
948 - $last_tag = 'caption';
949 - } else {
950 - $last_tag = '';
951 - }
 965+ if ( $first_character === '|' ) {
 966+ $last_tag = 'td';
 967+ } elseif ( $first_character === '!' ) {
 968+ $last_tag = 'th';
 969+ } elseif ( $first_character === '|+' ) {
 970+ $last_tag = 'caption';
 971+ } else {
 972+ $last_tag = '';
 973+ }
952974
953 - array_push( $last_tag_history , $last_tag );
 975+ array_push( $last_tag_history , $last_tag );
954976
955 - # A cell could contain both parameters and data
956 - $cell_data = explode( '|' , $cell , 2 );
 977+ # A cell could contain both parameters and data... but the pipe could
 978+ # also be the start of a nested table, or a raw pipe inside an invalid
 979+ # link (bug 553).
 980+ $cell_data = preg_split( '/(?<!\{)\|/', $cell, 2 );
957981
958 - # Bug 553: Note that a '|' inside an invalid link should not
959 - # be mistaken as delimiting cell parameters
960 - if ( strpos( $cell_data[0], '[[' ) !== false ) {
961 - $cell = "{$previous}<{$last_tag}>{$cell}";
962 - } elseif ( count( $cell_data ) == 1 ) {
963 - $cell = "{$previous}<{$last_tag}>{$cell_data[0]}";
964 - } else {
965 - $attributes = $this->mStripState->unstripBoth( $cell_data[0] );
966 - $attributes = Sanitizer::fixTagAttributes( $attributes , $last_tag );
967 - $cell = "{$previous}<{$last_tag}{$attributes}>{$cell_data[1]}";
968 - }
 982+ # Bug 553: a '|' inside an invalid link should not
 983+ # be mistaken as delimiting cell parameters
 984+ if ( strpos( $cell_data[0], '[[' ) !== false ) {
 985+ $data = $cell;
 986+ $cell = "{$previous}<{$last_tag}>";
 987+ } elseif ( count( $cell_data ) == 1 ) {
 988+ $cell = "{$previous}<{$last_tag}>";
 989+ $data = $cell_data[0];
 990+ } else {
 991+ $attributes = $this->mStripState->unstripBoth( $cell_data[0] );
 992+ $attributes = Sanitizer::fixTagAttributes( $attributes , $last_tag );
 993+ $cell = "{$previous}<{$last_tag}{$attributes}>";
 994+ $data = $cell_data[1];
 995+ }
969996
970 - $outLine .= $cell;
971 - array_push( $td_history , true );
 997+ # Bug 529: the start of a table cell should be a linestart context for
 998+ # processing other block markup, including nested tables. The original
 999+ # implementation of this was to add a newline before every brace construct,
 1000+ # which broke all manner of other things. Instead, push the contents
 1001+ # of the cell back into the stream and come back to it later. But don't
 1002+ # do that if the first line is empty, or you may get extra whitespace
 1003+ if( $data ){
 1004+ array_unshift( $extraLines, trim( $data ) );
9721005 }
 1006+
 1007+ $outLine .= $cell;
 1008+ array_push( $td_history , true );
9731009 }
9741010 $out .= $outLine . "\n";
975 - }
 1011+ } while( $lines->valid() || count( $extraLines ) );
9761012
9771013 # Closing open td, tr && table
9781014 while ( count( $td_history ) > 0 ) {
@@ -2235,6 +2271,7 @@
22362272 '/(?:<\\/table|<\\/blockquote|<\\/h1|<\\/h2|<\\/h3|<\\/h4|<\\/h5|<\\/h6|'.
22372273 '<td|<th|<\\/?div|<hr|<\\/pre|<\\/p|'.$this->mUniqPrefix.'-pre|<\\/li|<\\/ul|<\\/ol|<\\/?center)/iS', $t );
22382274 if ( $openmatch or $closematch ) {
 2275+
22392276 $paragraphStack = false;
22402277 # TODO bug 5718: paragraph closed
22412278 $output .= $this->closeParagraph();
@@ -3225,11 +3262,11 @@
32263263 $text = wfEscapeWikiText( $text );
32273264 } elseif ( is_string( $text )
32283265 && !$piece['lineStart']
3229 - && preg_match( '/^(?:{\\||:|;|#|\*)/', $text ) )
 3266+ && preg_match( '/^{\\|/', $text ) )
32303267 {
3231 - # Bug 529: if the template begins with a table or block-level
3232 - # element, it should be treated as beginning a new line.
3233 - # This behaviour is somewhat controversial.
 3268+ # Bug 529: if the template begins with a table, it should be treated as
 3269+ # beginning a new line. This previously handled other block-level elements
 3270+ # such as #, :, etc, but these have many false-positives (bug 12974).
32343271 $text = "\n" . $text;
32353272 }
32363273
@@ -3288,7 +3325,7 @@
32893326
32903327 if ( !$title->equals( $cacheTitle ) ) {
32913328 $this->mTplRedirCache[$cacheTitle->getPrefixedDBkey()] =
3292 - array( $title->getNamespace(), $cdb = $title->getDBkey() );
 3329+ array( $title->getNamespace(), $title->getDBkey() );
32933330 }
32943331
32953332 return array( $dom, $title );
Index: trunk/phase3/RELEASE-NOTES
@@ -87,6 +87,11 @@
8888 members.
8989 * (bug 2585) Image pages should send 404 if no image, no shared image and no
9090 description page.
 91+* (bug 12974) The behaviour of wikitable and template parsing was altered. Previously,
 92+ all templates were treated as being at the start of a block context for the purposes
 93+ of evaluating wikitext elements like #, :, *, and tables (bug 529). Now this
 94+ behaviour is only applied to wikitable-start {|, but the first line of a wikitable
 95+ cell is now treated as a linestart.
9196
9297 === API changes in 1.18 ===
9398 * (bug 26339) Throw warning when truncating an overlarge API result

Follow-up revisions

RevisionCommit summaryAuthorDate
r81012Provisionally reverting r80430 (bug 529, 12974)...brion01:16, 26 January 2011

Comments

#Comment by Krinkle (talk | contribs)   12:01, 17 January 2011

elements and other contructs in a {| wikitable |} are now no longer a direct child of a table cell.

Adding <p>-paragraphs to each and every table cell seems a bad idea to me and is probably not semantically correct either. It for one breaks javascript that works with direct child relations in templates.

#Comment by Krinkle (talk | contribs)   12:03, 17 January 2011

However treating stuff in a table cell as "new line" ish ( headings, lists, tables ) sounds great and solves the need for an extra new line. But right now it breaks stuff.

#Comment by Happy-melon (talk | contribs)   13:55, 17 January 2011

The current behaviour is that ‎<p> tags are added to all paragraphs in the cell which begin on a new line, as usual, with the exception that the first cell is not a new line (unless it comes from a template, in which case it is). So

{|
| Foo

Bar
|
Foo

Bar
| {{template-containing-Foo}}

Bar
|}

Becomes (with some tweaks to the whitespace)

<table><tr>

<td>    Foo     <p>Bar</p> </td>
<td> <p>Foo</p> <p>Bar</p> </td>
<td> <p>Foo</p> <p>Bar</p> </td>

</tr></table>

Which is horribly inconsistent, and results (erratically) in the behaviour you say is problematic, but not consistently so. Having parts of the text on different context levels is almost certainly less semantically correct than applying ‎<p> universally; the fact that single-paragraph cells don't get a ‎<p> tag at all is a side-effect, and not necessarily a desirable one: why should adding more content to a cell alter the behaviour of the existing content? And why should the two cells in the table:

{|
| Text
* And 
* List
| 
Text
* And
* List
|}

Have different internal structures?

#Comment by Krinkle (talk | contribs)   12:17, 18 January 2011

You are correct in that the current behaviour (before your commit) doesn't always make sense, but that was the output that has been accepted and lots of stuff depends on it.

For example the sortable script appends linked images to the table cells as sortable-buttons, which, before your commit, where on the same line. Now there's suddenly a "paragraph" in every table cell (including table headings) which makes no sense and thus anything before or after (since P is a block level element) is now on a new line. (try it, make a class="wikitable sortable" table)

Perhaps you can make it so that it looks to see if there's stuff like * # {| coming and treat them as such, and otherwise stick to inline like it used to.

Either way, it's more a bug that the the current parser sometimes adds a paragraph, it's not a bug that it sometimes doesn't.

#Comment by Krinkle (talk | contribs)   12:18, 18 January 2011

(

also bring along CSS

p {
   line-height: 1.5em;
  margin: 0.4em 0px 0.5em;
} 

making table cells much bigger than they used to be.

#Comment by Krinkle (talk | contribs)   12:25, 18 January 2011

sorry for the double post, pressed submit too early.

Testcase

#Comment by Happy-melon (talk | contribs)   14:18, 18 January 2011

But presumably these problems with sortable tables already occur, just not consistently so; whenever someone hasn't started the table header on the same line as the cell marker, or whenever there's a multi-paragraph header. For sure, these cases are not the common ones, but they are perfectly reasonable ones. It would be easy to fix SortableTables by prepending the links and right-floating them, that would also make them work consistently with anything else that might be inserted into a cell. Or you could make them translucent and attached to the upper-right corner of the cell, and become opaque on hover, which is also a common position.

Overall, I'm not denying that anything you've brought up happens; it's the timeless battle between consistency and optimising-for-the-most-common-case. If one piece of subtext is parsed into paragraph tags in a given position, and another substring in the same position is parsed into a list, why in another position should one of the strings be parsed the same, and the other be parsed differently? We could remove paragraph margins inside tables, but why should the same markup appear differently in that context?

#Comment by Happy-melon (talk | contribs)   14:20, 18 January 2011

A follow-up which was a step in the right direction might be to remove the vertical padding from wikitable cells, since we now know that anything placed inside them will have margins of its own.

#Comment by Krinkle (talk | contribs)   14:37, 18 January 2011

SortableTables could be altered to place the linked images at the end of the first paragraph or floating before the paragraph. But that brings other issues of it's own since floating does not force content width. In order words: A table is and should not by default take 100% width. if a cell has a few words in it the cell expand to a width to contain those words on a single line. But if an element floats most browsers don't force the width so it will still come on it's own line above the paragraph.

See below:

{| class="wikitable"
! word <span>[s]</span>
! word <span>[s]</span>
|- 
| cont
| tent
|}

renders:

word [s] word [s]
cont tent


{| class="wikitable"
! <span style="float:right">[s]</span> <p>
word 
! <span style="float:right">[s]</span> <p>
word 
|- 
| <p>
cont
</p>
| <p>
tent
</p>
|}


renders:

[s]

word

[s]

word

cont

tent

And aside from this the point is not that this particlar sortable-script could be fixed. The point is that there are many gadgets, many extension and user scripts both within the 100s of wikis by the Wikimedia Foundation and outside of it and scripts working with tables and dynamic insertion/hiding of content are not rare.

So in my opinion this commit does not fix anything that is currently broken, nor does it enable things that can't be done now (albeit not always as clean in wikitext)

– rather this commit breaks existing code and causes paragraphs in tables (which are almost never wanted, tables are for tabular data, there is no semantically correct  usecase for tabular data that requires being wrapped in paragraphs)  and also extra code (bandwidth?).

Anyway, these are just my opinions and conclusions based on what I see on TranslateWiki and a local wiki running trunk with some of the bigger articles on Wikipedia.

I think people understand that block level elements start on a new line, meaning if they want to start a heading, table or list in a table they enter a new line first. That's what they currently do:

head more
content * list * here

Heading

#Comment by Krinkle (talk | contribs)   14:45, 18 January 2011

not to mention the bots running on Wikipedia that look for certain pieces of content, if from now on blocklevel elements can start inside a table cell without a new line... (such as the following, which renders two list items and a heading on the current trunk)

{|
! * test
* test
|-
| == hi ==
|}

http://translatewiki.net/w/i.php?title=Sandbox&oldid=2643446

.. that means several rules will have to be rewritten.

Consistency in the parser existed:

  • Block level elements start on a new line (regardless whether in a table)
  • Without a new line characters are treated inline (#, *, = etc.)

if there's any place in the parser the above two points are not the case, I think we have a valid bug.

#Comment by Happy-melon (talk | contribs)   16:13, 18 January 2011

So in my opinion this commit does not fix anything that is currently broken

That is manifestly not true; try expanding == {{PAGENAME}} == on en:Talk:*-algebra. The 'fix broken things' part of this bug is to undo the hack which inserted newlines at the start of template expansions if the template text contained block elements (bug 12974 contains a whole raft of places where this is unintuitive and in some cases unfixable). The table stuff is just trying to more carefully resolve the issue (bug 529) which prompted the hack to be put in in the first place.

Consistency in the parser existed

That's precisely what did not exist. Previously, "block level elements start on a new line", except when they are produced from a template, in which case they're block level elements whether or not they start on a new line. So neither of your bullets is correct in the context of template expansion, in the old case.

I'd be reasonably happy with just removing the newline hack without replacing it with the table stuff, but that breaks B/C much more substantially, in that wikitext which was formerly recognised would then not be recognised. Requiring third-party scripts to be updated is a much softer B/C break than requiring markup to be changed; we should prioritise maintaining B/C on a vanilla install.

causes paragraphs in tables (which are almost never wanted, tables are for tabular data, there is no semantically correct usecase for tabular data that requires being wrapped in paragraphs)

‎<td> elements contain flow content; paragraphs are flow content. There is absolutely no semantic reason why it may not be appropriate to put paragraphs inside tables. While you're right that tables shouldn't be used purely for layout, that in no way precludes table cells legitimately containing content which is more than monosyllabic. It's equally justifiable to have or not to have paragraphs inside table cells; ‎<p> tags are 'transparent' in the sense that they are flow content, and also contain flow content. What is definitely less clear and consistent is to sometimes have content inside tags and sometimes not, and especially to mix levels within the same table cell.

and also extra code (bandwidth?)

Oh please. You can save more bytes than this adds by spriting the SortableTables arrows, if you really care that much.

#Comment by Ilmari Karonen (talk | contribs)   16:53, 18 January 2011

Even for strictly tabular data, I can easily image use cases where paragraphs breaks in table cells would be desired. (For example, the content might be a multiparagraph quotation.) Given that, it does seem inconsistent for table cell content to be wrapped in <p> tags only sometimes.

I did some quick tests, and it looks like adding

.wikitable p, .wikitable ul, .wikitable ol, .wikitable dl  { margin-bottom: 0.2em; margin-top: 0.2em; }

and replacing the current table cell padding rule with

.wikitable th, .wikitable td { padding: 0 0.2em; }

ought to solve the CSS issues (modulo conflicts with site styles and/or custom skins; it might be possible to minimize those by making the selector for the padding rule as specific as possible, e.g. table.wikitable tr th, table.wikitable tr td). This does make paragraph breaks narrower inside .wikitable tables than outside them, but IMO that actually looks better.

(Alternatively, it might perhaps be possible to hack the parser to "unwrap" the first child element of a table cell if it's a <p>, and then just set .wikitable p { margin-bottom: 0; }. This would avoid changing the HTML output for the most common use case, and should thus minimize breakage.)

Fixing the sortable and collapsible table JS to handle <p> tags in table cells shouldn't be hard, and would be beneficial in any case (since they can occur already).

#Comment by Krinkle (talk | contribs)   17:50, 18 January 2011

@Happy-melon: The fact that {{PAGENAME}} puts characters that affect wikitext on to page page is a seperate bug. It should encode all characters (all wikitext characters are either forbidden in titles or they are escaped, such as quotes: http://commons.wikimedia.org/wiki/User:Krinkle/%27%27%27test%27%27helo%27%27lo ). In this case a star/asterisk should be encoded as , just like simple single quotes are escaped through magic words like '.

Regarding "Consistency in the parser existed:". I know those two points are not true at the moment, and that's unfortunate.

If it's absolutely sure the parser will always but block level elements inside a table cell (be it the element created through wikitext (headings, lists, tables etc.)) then we could use CSS to remove the padding tables have from nature, but how about the 100s of templates that use style="" to add a certain amount padding to tables for messages, infoboxes, talkpage notices etc. those would all get an additional paragraph and would require not only altering site css, also any and all occurenses of inline styling.

Aside from having to fix all those wikitext, there's still the point that users are taught and bots are programmed to look only for blocklevel elements on new lines (ie. when I write == foobar == in this sentence or some *stars* and #hashes# they are not recognized), inside a table cell right after the pipe, that is inline context and should be treated as such, not broken up more.


I agree with you that fixing the parser where it forces block level context where it's inline (ie. templates inside tables) is even more problematic in case of breakage. So, the best outcome imho is to do neither: keep it the way it was: A sucky annoying inconsistent parser that we all know and love.

#Comment by Ilmari Karonen (talk | contribs)   17:52, 18 January 2011

{{PAGENAME}} is not {{PAGENAMEE}}. It shouldn't be doing any escaping.

#Comment by Nikerabbit (talk | contribs)   17:54, 18 January 2011

But it does, because the parser is horrible and tries to parse everything in the page name, namely * and ''.

#Comment by Krinkle (talk | contribs)   17:55, 18 January 2011

No it does not parse single quotes. It escapes them.

PAGENAMEE is for url encoding. PAGENAME is to display the page name and should not affect wikitext therefor it escapes single quotes, see also here.

#Comment by Ilmari Karonen (talk | contribs)   17:59, 18 January 2011

Hmm, OK, I was wrong. It does call wf wfEscapeWikiText(). So the fix for bug 26781 would presumably be to add "*#;:" to the list of characters it escapes.

#Comment by Krinkle (talk | contribs)   18:03, 18 January 2011

Yes please ;-)

#Comment by Happy-melon (talk | contribs)   19:28, 18 January 2011

And what do you want {{#ifeq:{{PAGENAME}}|*-algebra}} to return?

#Comment by Platonides (talk | contribs)   22:31, 18 January 2011

r80511 and r80512 handle the en:Talk:*-algebra {{PAGENAME}}

#Comment by Krinkle (talk | contribs)   17:52, 18 January 2011

In this case a star/asterisk should be encoded as ⋆, just like simple single quotes are escaped through magic words like '.

That should've been:

In this case a star/asterisk should be encoded as &#x22c6;, just like simple single quotes are escaped through magic words like &#39;.
#Comment by Happy-melon (talk | contribs)   19:34, 18 January 2011

A star and an asterisk are not the same; en:⋆ and en:* are separate articles, and rightly so.

#Comment by Ilmari Karonen (talk | contribs)   18:07, 18 January 2011

Anyway, whatever we do with table parsing and wfEscapeWikiText(), the parts of this rev that don't affect tables should be kept if at all possible. Parsing foo{{#if:x|*bar}} should not produce a bulleted list.

#Comment by Platonides (talk | contribs)   22:54, 18 January 2011

This fixes quirks as paragraphs appearing sometimes inside tables:

<td> Fooo
<p>Bar
</p>
</td>

If we are going to break table compatibility, it would be time for doing a more complete fixup.

#Comment by Krinkle (talk | contribs)   09:22, 20 January 2011

I just attempted to fix a few gadgets and play with CSS rules regarding paragraphs in tables, but that didn't go very far.

  • Tables created through ‎<table> and ‎<tr> in wikitext etc. don't get the added paragraphs
  • Anything not coming from wikitext don't get the added paragraphs (forms with tables, extensions using tables (ie. Special:Translate), TablePager etc.) so assuming there is always a paragraph inside a table cell is wrong and even worse (ie. if we would remove the padding from table cells).

I still have to see why we would have to change the parser so that stuff in a table cell is "as if it were on a new line". Why would we want to accept:

  • Lists starting on the same line as the pipe
  • Headings and tables starting inline (in any other inline situation we dont do this either, why would we in a table ? )
{| 
! Head
! Ing
|-
| == heading works in trunk ==
| * list works in trunk
|}

Lorem ipsum == heading does not work ==
Dolor silet  * not a list.

Try explaining that to someone:

  • Block level elements need to start on a new line, except in a table...; Why ?; Because when a template is put inside a table it was decided it should be treated as if on a new line, so we decided everything in a table should be forced to start as-if it were a new line;.


I think we should simply add the fact that templates in tables behave special to the list of inconsistencies of the current parser. When a new parser is introduced, then we can simply fix everything at once without having to look back (ie. the concept I read on wikitech-l where there would be a period of time with two parsers) - but either way a change breaking current wikitext is wrong because there could be wikis out there with 100s of pages depending on this.

#Comment by Happy-melon (talk | contribs)   09:43, 20 January 2011

The answer to all your questions is "because that's how it was before". I'm only consciously introducing two changes:

  1. Brace constructs should not universally introduce a newline. Foo {{* Bar}} should not produce a bulleted list.
  2. If a particular place is a block context, it should always be a block context; that means that anywhere where a * or # should produce a list, a plain string should produce ‎<p> tags.

As I said, I'm not at all averse to leaving out the first-table-cell stuff and WONTFIXING but 529, but I do not accept that we should all bow in worship before a mystical new parser which will also heal our children and do the washing up. A new parser is just as far away as it has ever been; the closest current contender covers ~80% of the markup in a compiled language and performs broadly the same as the existing one. It will be years before we can even consider a migration process, and in that time bug 12974 will continue to be one of the most-duped on bugzilla.

#Comment by Happy-melon (talk | contribs)   20:14, 16 February 2011

This was reverted in r81012, although without prejudice to its reintroduction.

Status & tagging log