r111891 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r111890‎ | r111891 | r111892 >
Date:21:43, 19 February 2012
Author:dantman
Status:reverted (Comments)
Tags:
Comment:
Handle one part of bug 32545 while improving MediaWiki's support for Microdata in content by adding support for the <data>, <time>, <meta>, and <link> elements. The latter two are only permitted when Microdata is enabled, and for security are only allowed to be actual elements when they have a strict set of attributes set.
Modified paths:
  • /trunk/phase3/RELEASE-NOTES-1.20 (modified) (history)
  • /trunk/phase3/includes/Sanitizer.php (modified) (history)
  • /trunk/phase3/tests/parser/parserTests.txt (modified) (history)

Diff [purge]

Index: trunk/phase3/tests/parser/parserTests.txt
@@ -5420,6 +5420,26 @@
54215421 !! end
54225422
54235423 !! test
 5424+Sanitizer: Validating that <meta> and <link> work, but only for Microdata
 5425+!! input
 5426+<div itemscope>
 5427+ <meta itemprop="hello" content="world">
 5428+ <meta http-equiv="refresh" content="5">
 5429+ <link itemprop="hello" href="{{SERVER}}">
 5430+ <link rel="stylesheet" href="{{SERVER}}">
 5431+</div>
 5432+!! result
 5433+<div itemscope="itemscope">
 5434+<p> <meta itemprop="hello" content="world" />
 5435+ &lt;meta http-equiv="refresh" content="5"&gt;
 5436+</p>
 5437+ <link itemprop="hello" href="http&#58;//Britney-Spears" />
 5438+ &lt;link rel="stylesheet" href="<a rel="nofollow" class="external free" href="http://Britney-Spears">http://Britney-Spears</a>"&gt;
 5439+</div>
 5440+
 5441+!! end
 5442+
 5443+!! test
54245444 Language converter: output gets cut off unexpectedly (bug 5757)
54255445 !! options
54265446 language=zh
Index: trunk/phase3/includes/Sanitizer.php
@@ -364,7 +364,7 @@
365365 * @return string
366366 */
367367 static function removeHTMLtags( $text, $processCallback = null, $args = array(), $extratags = array(), $removetags = array() ) {
368 - global $wgUseTidy;
 368+ global $wgUseTidy, $wgHtml5, $wgAllowMicrodataAttributes;
369369
370370 static $htmlpairsStatic, $htmlsingle, $htmlsingleonly, $htmlnest, $tabletags,
371371 $htmllist, $listtags, $htmlsingleallowed, $htmlelementsStatic, $staticInitialised;
@@ -381,12 +381,19 @@
382382 'ruby', 'rt' , 'rb' , 'rp', 'p', 'span', 'abbr', 'dfn',
383383 'kbd', 'samp'
384384 );
 385+ if ( $wgHtml5 ) {
 386+ $htmlpairsStatic = array_merge( $htmlpairsStatic, array( 'data', 'time' ) );
 387+ }
385388 $htmlsingle = array(
386389 'br', 'hr', 'li', 'dt', 'dd'
387390 );
388391 $htmlsingleonly = array( # Elements that cannot have close tags
389392 'br', 'hr'
390393 );
 394+ if ( $wgHtml5 && $wgAllowMicrodataAttributes ) {
 395+ $htmlsingle[] = $htmlsingleonly[] = 'meta';
 396+ $htmlsingle[] = $htmlsingleonly[] = 'link';
 397+ }
391398 $htmlnest = array( # Tags that can be nested--??
392399 'table', 'tr', 'td', 'th', 'div', 'blockquote', 'ol', 'ul',
393400 'dl', 'font', 'big', 'small', 'sub', 'sup', 'span'
@@ -528,6 +535,10 @@
529536 call_user_func_array( $processCallback, array( &$params, $args ) );
530537 }
531538
 539+ if ( !Sanitizer::validateTag( $params, $t ) ) {
 540+ $badtag = true;
 541+ }
 542+
532543 # Strip non-approved attributes from the tag
533544 $newparams = Sanitizer::fixTagAttributes( $params, $t );
534545 }
@@ -709,6 +720,37 @@
710721 }
711722
712723 /**
 724+ * Takes attribute names and values for a tag and the tah name and
 725+ * validates that the tag is allowed to be present.
 726+ * This DOES NOT validate the attributes, nor does it validate the
 727+ * tags themselves. This method only handles the special circumstances
 728+ * where we may want to allow a tag within content but ONLY when it has
 729+ * specific attributes set.
 730+ *
 731+ * @param $
 732+ */
 733+ static function validateTag( $params, $element ) {
 734+ $params = Sanitizer::decodeTagAttributes( $params );
 735+
 736+ if ( $element == 'meta' || $element == 'link' ) {
 737+ if ( !isset( $params['itemprop'] ) ) {
 738+ // <meta> and <link> must have an itemprop="" otherwise they are not valid or safe in content
 739+ return false;
 740+ }
 741+ if ( $element == 'meta' && !isset( $params['content'] ) ) {
 742+ // <meta> must have a content="" for the itemprop
 743+ return false;
 744+ }
 745+ if ( $element == 'link' && !isset( $params['href'] ) ) {
 746+ // <link> must have an associated href=""
 747+ return false;
 748+ }
 749+ }
 750+
 751+ return true;
 752+ }
 753+
 754+ /**
713755 * Take an array of attribute names and values and normalize or discard
714756 * illegal values for the given element type.
715757 *
@@ -809,7 +851,7 @@
810852 unset( $out['itemid'] );
811853 unset( $out['itemref'] );
812854 }
813 - # TODO: Strip itemprop if we aren't descendants of an itemscope.
 855+ # TODO: Strip itemprop if we aren't descendants of an itemscope or pointed to by an itemref.
814856 }
815857 return $out;
816858 }
@@ -1483,7 +1525,7 @@
14841526
14851527 # Numbers refer to sections in HTML 4.01 standard describing the element.
14861528 # See: http://www.w3.org/TR/html4/
1487 - $whitelist = array (
 1529+ $whitelist = array(
14881530 # 7.5.4
14891531 'div' => $block,
14901532 'center' => $common, # deprecated
@@ -1611,7 +1653,24 @@
16121654 # 'title' may not be 100% valid here; it's XHTML
16131655 # http://www.w3.org/TR/REC-MathML/
16141656 'math' => array( 'class', 'style', 'id', 'title' ),
 1657+ );
 1658+
 1659+ if ( $wgHtml5 ) {
 1660+ # HTML5 elements, defined by:
 1661+ # http://www.whatwg.org/specs/web-apps/current-work/multipage/
 1662+ $whitelist += array(
 1663+ 'data' => array_merge( $common, array( 'value' ) ),
 1664+ 'time' => array_merge( $common, array( 'datetime' ) ),
 1665+
 1666+ // meta and link are only present when Microdata is allowed anyways
 1667+ // so we don't bother adding another condition here
 1668+ // meta and link are only valid for use as Microdata so we do not
 1669+ // allow the common attributes here.
 1670+ 'meta' => array( 'itemprop', 'content' ),
 1671+ 'link' => array( 'itemprop', 'href' ),
16151672 );
 1673+ }
 1674+
16161675 return $whitelist;
16171676 }
16181677
Index: trunk/phase3/RELEASE-NOTES-1.20
@@ -22,6 +22,8 @@
2323 * (bug 34475) Add support for IP/CIDR notation to tablesorter
2424 * (bug 27619) Remove preference option to display broken links as link?
2525 * (bug 15404) Add support for sorting fractions in jquery.tablesorter
 26+* The <data>, <time>, <meta>, and <link> elements are allowed within WikiText for use
 27+ with Microdata.
2628
2729 === Bug fixes in 1.20 ===
2830 * (bug 30245) Use the correct way to construct a log page title.

Follow-up revisions

RevisionCommit summaryAuthorDate
r111898Followup r111891; Base the $staticInitialised variable off of the global setu...dantman23:40, 19 February 2012
r111901Followup r111891; Update the test to also make sure things like http-equiv an...dantman00:42, 20 February 2012
r111903Followup r111891; Update tidy.conf so that it doesn't strip out <data> and <t...dantman01:01, 20 February 2012
r111973Revert Microdata improvements in r111891, r111898, r111899, r111901, r111903,...dantman22:32, 20 February 2012

Comments

#Comment by RobLa-WMF (talk | contribs)   04:18, 20 February 2012

Hi Dantman, who are you planning to have review this work?

#Comment by Dantman (talk | contribs)   04:28, 20 February 2012

I don't usually plan out "who" is going to review code, someone usually just does. We have RDFa and Microdata code in MW already so someone must have reviewed that stuff.

#Comment by RobLa-WMF (talk | contribs)   21:42, 20 February 2012

Hi Dantman, I asked people to have a reviewer lined up before committing new stuff: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/58670/focus=58671

"Have a reviewer lined up ready to ok your revision *before you commit it*. If you can't get a tentative commitment from a reviewer, don't commit it."

Status & tagging log