r113297 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r113296‎ | r113297 | r113298 >
Date:21:06, 7 March 2012
Author:wikinaut
Status:reverted (Comments)
Tags:gerritmigration 
Comment:
fix for bug34763 'RSS feed items (HTML) are not rendered as HTML but htmlescaped'; tolerated controlled regression bug30377 'feed item length limitation', because this now becomes very tricky when we allow some tags in order to close bug 34763.
Modified paths:
  • /trunk/extensions/RSS/RELEASE-NOTES (modified) (history)
  • /trunk/extensions/RSS/RSS.php (modified) (history)
  • /trunk/extensions/RSS/RSSParser.php (modified) (history)

Diff [purge]

Index: trunk/extensions/RSS/RELEASE-NOTES
@@ -11,6 +11,13 @@
1212 (otherwise using the defaults - PHP will abort the entire program when your
1313 memory usage gets too high)
1414
 15+=== Version 2.12 2012-03-07 ===
 16+* bug fix 34763 "RSS feed items (HTML) are not rendered as HTML but htmlescaped"
 17+* regression bug 30377 "Add a new parameter to limit the number of characters
 18+ when rendering the channel item <description>". Feed item string length
 19+ limitation is difficult when we allow HTML <a> or <img> tags, because a mere
 20+ content-unaware limitation breaks (can break) tags which results in disastrous
 21+ rendering results.
1522
1623 === Version 2.11 2012-02-29 ===
1724 * function name typo correction
Index: trunk/extensions/RSS/RSSParser.php
@@ -312,6 +312,14 @@
313313 return $ret;
314314 }
315315
 316+ function sandboxParse($wikiText) {
 317+ global $wgTitle, $wgUser;
 318+ $myParser = new Parser();
 319+ $myParserOptions = ParserOptions::newFromUser($wgUser);
 320+ $result = $myParser->parse($wikiText, $wgTitle, $myParserOptions);
 321+ return $result->getText();
 322+ }
 323+
316324 /**
317325 * Render the entire feed so that each item is passed to the
318326 * template which the MediaWiki then displays.
@@ -320,7 +328,7 @@
321329 * @param $frame the frame param to pass to recursiveTagParse()
322330 */
323331 function renderFeed( $parser, $frame ) {
324 -
 332+
325333 $renderedFeed = '';
326334
327335 if ( isset( $this->itemTemplate ) && isset( $parser ) && isset( $frame ) ) {
@@ -336,15 +344,15 @@
337345 }
338346
339347 if ( $this->canDisplay( $item ) ) {
340 - $renderedFeed .= $this->renderItem( $item ) . "\n";
 348+ $renderedFeed .= $this->renderItem( $item, $parser ) . "\n";
341349 $headcnt++;
342350 }
343351 }
344352
345 - $renderedFeed = $parser->recursiveTagParse( $renderedFeed, $frame );
 353+ $renderedFeed = $this->sandboxParse( $renderedFeed );
346354
347 - }
348 -
 355+ }
 356+
349357 return $renderedFeed;
350358 }
351359
@@ -353,7 +361,7 @@
354362 *
355363 * @param $item Array: an array produced by RSSData where keys are the names of the RSS elements
356364 */
357 - protected function renderItem( $item ) {
 365+ protected function renderItem( $item, $parser ) {
358366
359367 $renderedItem = $this->itemTemplate;
360368
@@ -385,12 +393,14 @@
386394 $renderedItem = str_replace( '{{{date}}}', $txt, $renderedItem );
387395 break;
388396 default:
389 - $str = $this->escapeTemplateParameter( $item[$info] );
 397+ $str = $this->escapeTemplateParameter( $item[$info] );
 398+ /***
390399 if ( mb_strlen( $str ) > $this->ItemMaxLength ) {
391400 $str = mb_substr( $str, 0, $this->ItemMaxLength ) . " ...";
392401 }
 402+ ***/
393403 $txt = $this->highlightTerms( $str );
394 - $renderedItem = str_replace( '{{{' . $info . '}}}', $txt, $renderedItem );
 404+ $renderedItem = str_replace( '{{{' . $info . '}}}', $parser->insertStripItem( $str ), $renderedItem );
395405 }
396406 }
397407
@@ -434,41 +444,60 @@
435445 * to the other kinds of markup, to avoid user input ending a template
436446 * invocation.
437447 *
438 - * We change differently flavoured <p> and <br> tags to effective <br> tags,
439 - * other tags such as <a> will be rendered html-escaped.
 448+ * If you want to allow clickable link Urls (HTML <a> tag) in RSS feeds:
 449+ * $wgRSSAllowLinkTag = true;
440450 *
 451+ * If you want to allow images (HTML <img> tag) in RSS feeds:
 452+ * $wgAllowImageTag = true;
 453+ *
441454 */
442455 protected function escapeTemplateParameter( $text ) {
443 - $text = str_replace(
444 - array( '[', '|', ']', '\'', 'ISBN ',
445 - 'RFC ', '://', "\n=", '{{', '}}',
446 - ),
447 - array( '&#91;', '&#124;', '&#93;', '&#39;', 'ISBN&#32;',
448 - 'RFC&#32;', '&#58;//', "\n&#61;", '&#123;&#123;', '&#125;&#125;',
449 - ),
450 - htmlspecialchars( str_replace( "\n", "", $text ) )
451 - );
 456+ global $wgRSSAllowLinkTag, $wgAllowImageTag;
452457
453 - // keep some basic layout tags
454 - $text = str_replace(
455 - array( '&lt;p&gt;', '&lt;/p&gt;',
456 - '&lt;br/&gt;', '&lt;br&gt;', '&lt;/br&gt;',
457 - '&lt;b&gt;', '&lt;/b&gt;',
458 - '&lt;i&gt;', '&lt;/i&gt;',
459 - '&lt;u&gt;', '&lt;/u&gt;',
460 - '&lt;s&gt;', '&lt;/s&gt;',
461 - ),
462 - array( "", "<br/>",
463 - "<br/>", "<br/>", "<br/>",
464 - "'''", "'''",
465 - "''", "''",
466 - "<u>", "</u>",
467 - "<s>", "</s>",
468 - ),
469 - $text
470 - );
 458+ if ( isset( $wgRSSAllowLinkTag ) && $wgRSSAllowLinkTag ) {
 459+ $extra = array( "a" );
 460+ } else {
 461+ $extra = array();
 462+ }
471463
472 - return $text;
 464+ if ( ( isset( $wgRSSAllowLinkTag ) && $wgRSSAllowLinkTag )
 465+ || ( isset( $wgAllowImageTag ) && $wgAllowImageTag ) ) {
 466+
 467+ $ret = Sanitizer::removeHTMLtags( $text, null, array(), $extra, array( "iframe" ) );
 468+
 469+ } else { // use the old escape method for a while
 470+
 471+ $text = str_replace(
 472+ array( '[', '|', ']', '\'', 'ISBN ',
 473+ 'RFC ', '://', "\n=", '{{', '}}',
 474+ ),
 475+ array( '&#91;', '&#124;', '&#93;', '&#39;', 'ISBN&#32;',
 476+ 'RFC&#32;', '&#58;//', "\n&#61;", '&#123;&#123;', '&#125;&#125;',
 477+ ),
 478+ htmlspecialchars( str_replace( "\n", "", $text ) )
 479+ );
 480+
 481+ // keep some basic layout tags
 482+ $ret = str_replace(
 483+ array( '&lt;p&gt;', '&lt;/p&gt;',
 484+ '&lt;br/&gt;', '&lt;br&gt;', '&lt;/br&gt;',
 485+ '&lt;b&gt;', '&lt;/b&gt;',
 486+ '&lt;i&gt;', '&lt;/i&gt;',
 487+ '&lt;u&gt;', '&lt;/u&gt;',
 488+ '&lt;s&gt;', '&lt;/s&gt;',
 489+ ),
 490+ array( "", "<br/>",
 491+ "<br/>", "<br/>", "<br/>",
 492+ "'''", "'''",
 493+ "''", "''",
 494+ "<u>", "</u>",
 495+ "<s>", "</s>",
 496+ ),
 497+ $text
 498+ );
 499+ }
 500+
 501+ return $ret;
473502 }
474503
475504 /**
Index: trunk/extensions/RSS/RSS.php
@@ -4,7 +4,7 @@
55 *
66 * @file
77 * @ingroup Extensions
8 - * @version 2.11
 8+ * @version 2.12
99 * @author mutante, Daniel Kinzler, Rdb, Mafs, Thomas Gries, Alxndr, Chris Reigrut, K001
1010 * @author Kellan Elliott-McCrea <kellan@protest.net> -- author of MagpieRSS
1111 * @author Jeroen De Dauw
@@ -14,7 +14,7 @@
1515 * @link http://www.mediawiki.org/wiki/Extension:RSS Documentation
1616 */
1717
18 -define( "EXTENSION_RSS_VERSION", "2.11 20120229" );
 18+define( "EXTENSION_RSS_VERSION", "2.12 20120307" );
1919
2020 if ( !defined( 'MEDIAWIKI' ) ) {
2121 die( "This is not a valid entry point.\n" );
@@ -93,5 +93,12 @@
9494
9595 // limit the number of characters in the item description
9696 // or set to false for unlimited length.
97 -// $wgRSSItemMaxLength = false;
 97+// THIS IS CURRENTLY NOT WORKING (bug 30377)
9898 $wgRSSItemMaxLength = false;
 99+
 100+// You can choose to allow active links in feed items; default: false
 101+$wgRSSAllowLinkTag = false;
 102+
 103+// If you want to see images in feed items, then you need to globally allow
 104+// image tags in your wiki by using the MediaWiki parameter; default: false
 105+// $wgAllowImageTag = true;

Follow-up revisions

RevisionCommit summaryAuthorDate
r114390Revert r111347, r111348, r111350, r111351, r111515, r111816, r112243, r112251......catrope18:40, 21 March 2012

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r111350fix for bug30377 : add a new parameter to limit the number of characters when...wikinaut07:23, 13 February 2012
r112709function name typo correction. Version number updatewikinaut20:00, 29 February 2012

Comments

#Comment by Bawolff (talk | contribs)   02:53, 8 March 2012

$ret = Sanitizer::removeHTMLtags( $text, null, array(), $extra, array( "iframe" ) );

Why is iframe specifically mentioned? If there's a good reason, should have a comment.


+ $renderedFeed = $this->sandboxParse( $renderedFeed );

Why does this use its own instance of the parser?

In regards to initializing own instance of parser, initializing the parser using $wgUser 's options, and $wgTitle doesn't seem like the greatest idea. $parser->getTitle() would probably be better than $wgTitle, and for $wgUser, sometimes the parser needs to parse things using a user's options other than the current user (I think it does this to get a canonical version of the page for links tables, but don't quote me on that). You can get the current parser options using $parser->getOptions(). However as I said above, it doesn't seem like the sandboxParse is actually necessary in your code.

#Comment by Wikinaut (talk | contribs)   06:51, 8 March 2012

ad i) iframe: I saw in a E:RSS rendered feed http://rss.slashdot.org/Slashdot/slashdot one or several iframe tags which came through htmlescaped. I will make a deeper analysis of this specific case of iframes. I just wanted to be sure that iframes are expressly escaped, however we can further discuss this depending on my analysis.

ad ii) this was also discussed with Brion, and not working in the first test: not working without a (local sandbox) parsing in the local context as described here https://www.mediawiki.org/wiki/Manual:Special_pages#workaround_.231 from where it is verbatim copied. Your comment means that the text on https://www.mediawiki.org/wiki/Manual:Special_pages#workaround_.231 should be further corrected (Brion already corrected https://www.mediawiki.org/w/index.php?title=Manual%3ASpecial_pages&diff=505706&oldid=490966 "fix for calling private func; use the public one").

Summary: I will checked and make further analysis of the two points you had. However, the current code is working as intended, can perhaps be improved. I invite everyone - especially parser experts - to help me here, because it was a long and windy road already.

Does this sufficiently answer your questions ?

#Comment by Wikinaut (talk | contribs)   07:51, 8 March 2012

RE: + $renderedFeed = $this->sandboxParse( $renderedFeed );

I double-checked it. result: the sandbox-parsign is needed. If I use recursiveTagParse the wiki page is not correctly rendered and elements from the content window break out and appear outside, navigation bar broken etc. So what has been written on https://www.mediawiki.org/wiki/Manual:Special_pages#workaround_.231 applies to the present issue.

Status & tagging log