r94465 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r94464‎ | r94465 | r94466 >
Date:16:41, 14 August 2011
Author:dantman
Status:resolved (Comments)
Tags:
Comment:
Add code to the sanitizer to convert presontational attributes that were removed in html5 into inline css. This allows wiki to keep using them in short loose WikiText but still output valid modern markup.
Note that there were some attributes excluded. Namely stuff on img and object, and the table cellspacing and cellpading which aren't easily converted into inline css.
Modified paths:
  • /trunk/phase3/RELEASE-NOTES-1.19 (modified) (history)
  • /trunk/phase3/includes/DefaultSettings.php (modified) (history)
  • /trunk/phase3/includes/Sanitizer.php (modified) (history)

Diff [purge]

Index: trunk/phase3/RELEASE-NOTES-1.19
@@ -33,6 +33,9 @@
3434 * Added two new GetLocalURL hooks to better serve extensions working on a limited
3535 type of titles.
3636 * Added a --no-updates flag to importDump.php that skips updating the links tables.
 37+* Most presentational html attributes like valign are now converted to inline
 38+ css style rules. These attributes were removed from html5 and so we clean them up
 39+ when $wgHtml5 is enabled. This can be disabled using $wgCleanupPresentationalAttributes.
3740
3841 === Bug fixes in 1.19 ===
3942 * $wgUploadNavigationUrl should be used for file redlinks if
Index: trunk/phase3/includes/Sanitizer.php
@@ -603,6 +603,89 @@
604604 }
605605
606606 /**
 607+ * Take an array of attribute names and values and fix some deprecated values
 608+ * for the given element type.
 609+ * This does not validate properties, so you should ensure that you call
 610+ * validateTagAttributes AFTER this to ensure that the resulting style rule
 611+ * this may add is safe.
 612+ *
 613+ * - Converts most presentational attributes like align into inline css
 614+ *
 615+ * @param $attribs Array
 616+ * @param $element String
 617+ * @return Array
 618+ */
 619+ static function fixDeprecatedAttributes( $attribs, $element ) {
 620+ global $wgHtml5, $wgCleanupPresentationalAttributes;
 621+
 622+ // presentational attributes were removed from html5, we can leave them
 623+ // in when html5 is turned off
 624+ if ( !$wgHtml5 || !$wgCleanupPresentationalAttributes ) {
 625+ return $attribs;
 626+ }
 627+
 628+ $table = array( 'table' );
 629+ $cells = array( 'td', 'th' );
 630+ $colls = array( 'col', 'colgroup' );
 631+ $tblocks = array( 'tbody', 'tfoot', 'thead' );
 632+ $h = array( 'h1', 'h2', 'h3', 'h4', 'h5', 'h6' );
 633+
 634+ $presentationalAttribs = array(
 635+ 'align' => array( 'text-align', array_merge( array( 'caption', 'hr', 'div', 'p', 'tr' ), $table, $cells, $colls, $tblocks, $h ) ),
 636+ 'clear' => array( 'clear', array( 'br' ) ),
 637+ 'height' => array( 'height', $cells ),
 638+ 'nowrap' => array( 'white-space', $cells ),
 639+ 'size' => array( 'height', array( 'hr' ) ),
 640+ 'type' => array( 'list-style-type', array( 'li', 'ol', 'ul' ) ),
 641+ 'valign' => array( 'vertical-align', array_merge( $cells, $colls, $tblocks ) ),
 642+ 'width' => array( 'width', array_merge( array( 'hr', 'pre' ), $table, $cells, $colls ) ),
 643+ );
 644+
 645+ $style = "";
 646+ foreach ( $presentationalAttribs as $attribute => $info ) {
 647+ list( $property, $elements ) = $info;
 648+
 649+ // Skip if this attribute is not relevant to this element
 650+ if ( !in_array( $element, $elements ) ) {
 651+ continue;
 652+ }
 653+
 654+ // Skip if the attribute is not used
 655+ if ( !array_key_exists( $attribute, $attribs ) ) {
 656+ continue;
 657+ }
 658+
 659+ $value = $attribs[$attribute];
 660+
 661+ // For nowrap the value should be nowrap instead of whatever text is in the value
 662+ if ( $attribute === 'nowrap' ) {
 663+ $value = 'nowrap';
 664+ }
 665+
 666+ // Size based properties should have px applied to them if they have no unit
 667+ if ( in_array( $attribute, array( 'height', 'width', 'size' ) ) ) {
 668+ if ( preg_match( '/^[\d.]+$/', $value ) ) {
 669+ $value = "{$value}px";
 670+ }
 671+ }
 672+
 673+ $style .= " $property: $value;";
 674+
 675+ unset( $attribs[$attribute] );
 676+ }
 677+
 678+ if ( !empty($style) ) {
 679+ // Prepend our style rules so that they can be overridden by user css
 680+ if ( isset($attribs['style']) ) {
 681+ $style .= " " . $attribs['style'];
 682+ }
 683+ $attribs['style'] = trim($style);
 684+ }
 685+
 686+ return $attribs;
 687+ }
 688+
 689+ /**
607690 * Take an array of attribute names and values and normalize or discard
608691 * illegal values for the given element type.
609692 *
@@ -849,8 +932,9 @@
850933 return '';
851934 }
852935
853 - $stripped = Sanitizer::validateTagAttributes(
854 - Sanitizer::decodeTagAttributes( $text ), $element );
 936+ $decoded = Sanitizer::decodeTagAttributes( $text );
 937+ $decoded = Sanitizer::fixDeprecatedAttributes( $decoded, $element );
 938+ $stripped = Sanitizer::validateTagAttributes( $decoded, $element );
855939
856940 $attribs = array();
857941 foreach( $stripped as $attribute => $value ) {
Index: trunk/phase3/includes/DefaultSettings.php
@@ -2167,6 +2167,11 @@
21682168 $wgAllowMicrodataAttributes = false;
21692169
21702170 /**
 2171+ * Cleanup as much presentational html like valign -> css vertical-align as we can
 2172+ */
 2173+$wgCleanupPresentationalAttributes = false;
 2174+
 2175+/**
21712176 * Should we try to make our HTML output well-formed XML? If set to false,
21722177 * output will be a few bytes shorter, and the HTML will arguably be more
21732178 * readable. If set to true, life will be much easier for the authors of

Follow-up revisions

RevisionCommit summaryAuthorDate
r98051Followup r94465; Don't use empty.dantman02:24, 25 September 2011
r98052Followup r94465; Release notes.dantman02:27, 25 September 2011
r98053Followup r94465; Add parser tests and turn the feature on-by-default like I i...dantman02:58, 25 September 2011
r98055Followup r94465 and r94465; Add phpunit tests for Sanitizer::fixDeprecatedAtt...dantman04:08, 25 September 2011

Comments

#Comment by 😂 (talk | contribs)   20:33, 14 September 2011
  • Needs tests
  • Sanitizer:: fixDeprecatedAttributes() shouldn't check the global, callers should be responsible for checking.
  • Don't use empty(), a weak boolean check on $style should be sufficient. If you're super paranoid, use $empty !== ""
#Comment by Dantman (talk | contribs)   02:24, 25 September 2011

Sanitizer::fixDeprecatedAttributes is called by Sanitizer::fixTagAttributes which has been around for awhile. You really think we should change fixTagAttributes' method just to have callers pass in the value of a global config var?

Status & tagging log