r82480 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r82479‎ | r82480 | r82481 >
Date:20:16, 19 February 2011
Author:platonides
Status:ok (Comments)
Tags:
Comment:
(Bug 27539) Allow attributes beginning with a digit in wiktext tag parameters.
Its removal in r70849 breaks ProofreadPage extension.
Restricted r82475 relaxation to just numbers.
Added tests.

This only affects wikitext (tag hooks).
MW_ATTRIBS_REGEX is only used through decodeTagAttributes() calls.
fixTagAttributes() calls decodeTagAttributes(), and would be nastier to
fix, since it is called with HTML parameters (eg. by removeHTMLtags)
but such incorrect parameters grabbed would be removed by validateTagAttributes()
Modified paths:
  • /trunk/phase3/includes/Sanitizer.php (modified) (history)
  • /trunk/phase3/tests/phpunit/includes/SanitizerTest.php (modified) (history)

Diff [purge]

Index: trunk/phase3/tests/phpunit/includes/SanitizerTest.php
@@ -68,5 +68,46 @@
6969 'Self-closing closing div'
7070 );
7171 }
 72+
 73+ function testDecodeTagAttributes() {
 74+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=bar' ), array( 'foo' => 'bar' ), 'Unquoted attribute' );
 75+ $this->assertEquals( Sanitizer::decodeTagAttributes( ' foo = bar ' ), array( 'foo' => 'bar' ), 'Spaced attribute' );
 76+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo="bar"' ), array( 'foo' => 'bar' ), 'Double-quoted attribute' );
 77+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=\'bar\'' ), array( 'foo' => 'bar' ), 'Single-quoted attribute' );
 78+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=\'bar\' baz="foo"' ), array( 'foo' => 'bar', 'baz' => 'foo' ), 'Several attributes' );
 79+
 80+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=\'bar\' baz="foo"' ), array( 'foo' => 'bar', 'baz' => 'foo' ), 'Several attributes' );
 81+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=\'bar\' baz="foo"' ), array( 'foo' => 'bar', 'baz' => 'foo' ), 'Several attributes' );
 82+
 83+ $this->assertEquals( Sanitizer::decodeTagAttributes( ':foo=\'bar\'' ), array( ':foo' => 'bar' ), 'Leading :' );
 84+ $this->assertEquals( Sanitizer::decodeTagAttributes( '_foo=\'bar\'' ), array( '_foo' => 'bar' ), 'Leading _' );
 85+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'Foo=\'bar\'' ), array( 'foo' => 'bar' ), 'Leading capital' );
 86+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'FOO=BAR' ), array( 'foo' => 'BAR' ), 'Attribute keys are normalized to lowercase' );
 87+
 88+ # Invalid beginning
 89+ $this->assertEquals( Sanitizer::decodeTagAttributes( '-foo=bar' ), array(), 'Leading - is forbidden' );
 90+ $this->assertEquals( Sanitizer::decodeTagAttributes( '.foo=bar' ), array(), 'Leading . is forbidden' );
 91+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo-bar=bar' ), array( 'foo-bar' => 'bar' ), 'A - is allowed inside the attribute' );
 92+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo-=bar' ), array( 'foo-' => 'bar' ), 'A - is allowed inside the attribute' );
 93+
 94+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo.bar=baz' ), array( 'foo.bar' => 'baz' ), 'A . is allowed inside the attribute' );
 95+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo.=baz' ), array( 'foo.' => 'baz' ), 'A . is allowed as last character' );
 96+
 97+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo6=baz' ), array( 'foo6' => 'baz' ), 'Numbers are allowed' );
 98+
 99+ # This bit is more relaxed than XML rules, but some extensions use it, like ProofreadPage (see bug 27539)
 100+ $this->assertEquals( Sanitizer::decodeTagAttributes( '1foo=baz' ), array( '1foo' => 'baz' ), 'Leading numbers are allowed' );
 101+
 102+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo$=baz' ), array(), 'Symbols are not allowed' );
 103+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo@=baz' ), array(), 'Symbols are not allowed' );
 104+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo~=baz' ), array(), 'Symbols are not allowed' );
 105+
 106+
 107+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=1[#^`*%w/(' ), array( 'foo' => '1[#^`*%w/(' ), 'All kind of characters are allowed as values' );
 108+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo="1[#^`*%\'w/("' ), array( 'foo' => '1[#^`*%\'w/(' ), 'Double quotes are allowed if quoted by single quotes' );
 109+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=\'1[#^`*%"w/(\'' ), array( 'foo' => '1[#^`*%"w/(' ), 'Single quotes are allowed if quoted by double quotes' );
 110+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=&"' ), array( 'foo' => '&"' ), 'Special chars can be provided as entities' );
 111+ $this->assertEquals( Sanitizer::decodeTagAttributes( 'foo=&foobar;' ), array( 'foo' => '&foobar;' ), 'Entity-like items are accepted' );
 112+ }
72113 }
73114
Index: trunk/phase3/includes/Sanitizer.php
@@ -39,7 +39,7 @@
4040 * Allows some... latitude.
4141 * Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes
4242 */
43 -$attribFirst = '[:A-Z_a-z-.0-9]'; // more lenient than standards (by allowing [-.0-9])
 43+$attribFirst = '[:A-Z_a-z0-9]';
4444 $attrib = '[:A-Z_a-z-.0-9]';
4545 $space = '[\x09\x0a\x0d\x20]';
4646 define( 'MW_ATTRIBS_REGEX',

Sign-offs

UserFlagDate
Catropeinspected20:30, 21 February 2011

Follow-up revisions

RevisionCommit summaryAuthorDate
r825761.17wmf1: MFT r82475, r82480, r82538, r82547, r82550, r82552, r82554, r82555,...catrope21:37, 21 February 2011
r85211MFT: r82297, r82307, r82309, r82312, r82315, r82337, r82391, r82392, r82403, ...demon21:01, 2 April 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r70849Fix to #17031. See bug ticket for test code & test-results.sean_colombo07:19, 11 August 2010
r82475Relaxed the allowed format of parser tag attributes as per bug 27539. One sid...sean_colombo19:25, 19 February 2011

Comments

#Comment by Catrope (talk | contribs)   20:30, 21 February 2011

Code is OK, haven't reviewed the tests.

Status & tagging log