r58694 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r58693‎ | r58694 | r58695 >
Date:09:43, 7 November 2009
Author:daniel
Status:reverted (Comments)
Tags:
Comment:
allow <a> tags and RDFa attributes to support RDFa output from license templates etc
Modified paths:
  • /trunk/phase3/includes/Sanitizer.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/Sanitizer.php
@@ -353,7 +353,7 @@
354354 if ( !$staticInitialised ) {
355355
356356 $htmlpairsStatic = array( # Tags that must be closed
357 - 'b', 'del', 'i', 'ins', 'u', 'font', 'big', 'small', 'sub', 'sup', 'h1',
 357+ 'a', 'b', 'del', 'i', 'ins', 'u', 'font', 'big', 'small', 'sub', 'sup', 'h1',
358358 'h2', 'h3', 'h4', 'h5', 'h6', 'cite', 'code', 'em', 's',
359359 'strike', 'strong', 'tt', 'var', 'div', 'center',
360360 'blockquote', 'ol', 'ul', 'dl', 'table', 'caption', 'pre',
@@ -605,6 +605,8 @@
606606 */
607607 static function validateAttributes( $attribs, $whitelist ) {
608608 $whitelist = array_flip( $whitelist );
 609+ $hrefExp = '/^(' . wfUrlProtocols() . ')[^\s]+$/';
 610+
609611 $out = array();
610612 foreach( $attribs as $attribute => $value ) {
611613 if( !isset( $whitelist[$attribute] ) ) {
@@ -626,6 +628,23 @@
627629 $wgEnforceHtmlIds ? 'noninitial' : 'xml' );
628630 }
629631
 632+ if ( $attribute === 'href' || $attribute === 'src' ) {
 633+ if ( !preg_match( $hrefExp, $value ) ) {
 634+ continue; //drop any href or src attributes not using an allowed protocol.
 635+ //NOTE: this also drops all relative URLs
 636+ }
 637+ }
 638+
 639+ //RDFa properties allow URIs. check them
 640+ if ( $attribute === 'rel' || $attribute === 'rev' ||
 641+ $attribute === 'about' || $attribute === 'property' || $attribute === 'resource' ||
 642+ $attribute === 'datatype' || $attribute === 'typeof' ) {
 643+ //Paranoia. Allow "simple" values but suppress javascript
 644+ if ( preg_match( '/(^|\s)javascript\s*:/i', $value ) ) {
 645+ continue;
 646+ }
 647+ }
 648+
630649 // If this attribute was previously set, override it.
631650 // Output should only have one attribute of each name.
632651 $out[$attribute] = $value;
@@ -1154,7 +1173,11 @@
11551174 * @return Array
11561175 */
11571176 static function setupAttributeWhitelist() {
1158 - $common = array( 'id', 'class', 'lang', 'dir', 'title', 'style' );
 1177+ $common = array( 'id', 'class', 'lang', 'dir', 'title', 'style',
 1178+ #RDFa attributes as specified in section 9 of http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014
 1179+ 'about', 'property', 'resource', 'datatype', 'typeof',
 1180+ );
 1181+
11591182 $block = array_merge( $common, array( 'align' ) );
11601183 $tablealign = array( 'align', 'char', 'charoff', 'valign' );
11611184 $tablecell = array( 'abbr',
@@ -1260,6 +1283,9 @@
12611284 'td' => array_merge( $common, $tablecell, $tablealign ),
12621285 'th' => array_merge( $common, $tablecell, $tablealign ),
12631286
 1287+ # 12.2
 1288+ 'a' => array_merge( $common, array( 'href', 'rel', 'rev' ) ), # rel/rev esp. for RDFa
 1289+
12641290 # 13.2
12651291 # Not usually allowed, but may be used for extension-style hooks
12661292 # such as <math> when it is rasterized

Follow-up revisions

RevisionCommit summaryAuthorDate
r58711reverting r58694, needs to be done as parser tag hook in order to register as...daniel15:03, 7 November 2009
r109723Introducing optional support for <a> tags, to be used with microdata resp. RD...daniel05:58, 22 January 2012

Comments

#Comment by MaxSem (talk | contribs)   10:14, 7 November 2009

Now imagine a typical spambot that posts something like <a href="http://example.com" style="display:none">. While the href attribute will be ignored, the whole spam addition will still be parsed as HTML and therefore hidden, making detection harder.

#Comment by MaxSem (talk | contribs)   10:45, 7 November 2009

Probably, RDFa support should be implemented as a parser tag?

#Comment by Duesentrieb (talk | contribs)   14:36, 7 November 2009

not convinced by the spammer argument, but thinking about it, the url needs to be registered as an external link with the parser output, and needs to trigger the spam filter hooks, so it's probably best to use a parser function after all. will revert.

#Comment by Duesentrieb (talk | contribs)   15:04, 7 November 2009

reverted in r58711

Status & tagging log