r49654 MediaWiki - Code Review archive

Repository:	MediaWiki
Revision:	< r49653‎ \| r49654 \| r49655 >
Date:	23:48, 19 April 2009
Author:	skizzerz
Status:	resolved (Comments)
Tags:
Comment:	Follow-up on r49330 * re-add $wgRestrictDisplayTitle * revert r49610 * prevent block-level and other such tags from being used in DISPLAYTITLE (while still allowing tags such as <sup> and <sub>)
Modified paths:	/trunk/phase3/RELEASE-NOTES (modified) (history) /trunk/phase3/includes/DefaultSettings.php (modified) (history) /trunk/phase3/includes/Sanitizer.php (modified) (history) /trunk/phase3/includes/parser/CoreParserFunctions.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/parser/CoreParserFunctions.php
—	—	@@ -236,13 +236,25 @@
237	237	* @param string $text Desired title text
238	238	* @return string
239	239	*/
240		~~- static function displaytitle( $parser, $displayTitle = '' ) {~~
	240	+ static function displaytitle( $parser, $text = '' ) {
	241	+ global $wgRestrictDisplayTitle;
	242	+
	243	+ #list of disallowed tags for DISPLAYTITLE
	244	+ #these will be escaped even though they are allowed in normal wiki text
	245	+ $bad = array( 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'blockquote', 'ol', 'ul', 'li',
	246	+ 'table', 'tr', 'th', 'td', 'dl', 'dd', 'caption', 'p', 'ruby', 'rb', 'rt', 'rp' );
	247	+
241	248	#only requested titles that normalize to the actual title are allowed through
242	249	#mimic the escaping process that occurs in OutputPage::setPageTitle
243		~~- $title = Title::newFromText( Sanitizer::stripAllTags( Sanitizer::normalizeCharReferences( Sanitizer::removeHTMLtags( $displayTitle ) ) ) );~~
	250	+ $text = Sanitizer::normalizeCharReferences( Sanitizer::removeHTMLtags( $text, null, array(), array(), $bad ) );
	251	+ $title = Title::newFromText( Sanitizer::stripAllTags( $text ) );
244	252
245		~~- if ( $title instanceof Title && $title->getFragment() == '' && $title->equals( $parser->mTitle ) ) {~~
246		~~- $parser->mOutput->setDisplayTitle( $displayTitle );~~
	253	+ if( !$wgRestrictDisplayTitle ) {
	254	+ $parser->mOutput->setDisplayTitle( $text );
	255	+ } else {
	256	+ if ( $title instanceof Title && $title->getFragment() == '' && $title->equals( $parser->mTitle ) ) {
	257	+ $parser->mOutput->setDisplayTitle( $text );
	258	+ }
247	259	}
248	260
249	261	return '';
Index: trunk/phase3/includes/Sanitizer.php
—	—	@@ -338,9 +338,11 @@
339	339	* @param string $text
340	340	* @param callback $processCallback to do any variable or parameter replacements in HTML attribute values
341	341	* @param array $args for the processing callback
	342	+ * @param array $extratags for any extra tags to include
	343	+ * @param array $removetags for any tags (default or extra) to exclude
342	344	* @return string
343	345	*/
344		~~- static function removeHTMLtags( $text, $processCallback = null, $args = array(), $extratags = array() ) {~~
	346	+ static function removeHTMLtags( $text, $processCallback = null, $args = array(), $extratags = array(), $removetags = array() ) {
345	347	global $wgUseTidy;
346	348
347	349	static $htmlpairs, $htmlsingle, $htmlsingleonly, $htmlnest, $tabletags,
—	—	@@ -377,8 +379,10 @@
378	380	'li',
379	381	);
380	382
381		~~- $htmlsingleallowed = array_merge( $htmlsingle, $tabletags );~~
382		~~- $htmlelements = array_merge( $htmlsingle, $htmlpairs, $htmlnest );~~
	383	+ $htmlsingleallowed = array_unique( array_merge( $htmlsingle, $tabletags ) );
	384	+ # Only allow elements that aren't specified in $removetags
	385	+ # Doing it here since this is the top-level check
	386	+ $htmlelements = array_diff( array_unique( array_merge( $htmlsingle, $htmlpairs, $htmlnest ) ), $removetags );
383	387
384	388	# Convert them all to hashtables for faster lookup
385	389	$vars = array( 'htmlpairs', 'htmlsingle', 'htmlsingleonly', 'htmlnest', 'tabletags',
Index: trunk/phase3/includes/DefaultSettings.php
—	—	@@ -3463,6 +3463,11 @@
3464	3464	$wgAllowDisplayTitle = true;
3465	3465
3466	3466	/**
	3467	+ * for consistency, restrict DISPLAYTITLE to titles that normalize to the same canonical DB key
	3468	+ */
	3469	+$wgRestrictDisplayTitle = true;
	3470	+
	3471	+/**
3467	3472	* Array of usernames which may not be registered or logged in from
3468	3473	* Maintenance scripts can still use these
3469	3474	*/
Index: trunk/phase3/RELEASE-NOTES
—	—	@@ -26,15 +26,12 @@
27	27	* Added $wgNoFollowDomainExceptions to allow exempting particular domain names
28	28	from rel="nofollow" on external links
29	29	* (bug 12970) Brought back $wgUseImageResize.
30		~~-* Added $wgRedirectOnLogin to allow specifying a page to redirect users to upon~~
31		~~- logging in (for example, "Main Page")~~
	30	+* Added $wgRedirectOnLogin to allow specifying a specifc page to redirect users
	31	+ to upon logging in (ex: "Main Page")
32	32	* Add $wgExportFromNamespaces for enabling/disabling the "export all from
33	33	namespace" option (disabled by default)
34	34	* (bug 18222) $wgMinimalPasswordLength default is now 1
35	35	* $wgSessionHandler can be used to configure session.save_handler
36		~~-* Removed $wgRestrictDisplayTitle, in effect permanently setting it to true.~~
37		~~- Without this variable, the DISPLAYTITLE magic word will only accept titles~~
38		~~- that are equivalent to the actual page title.~~
39	36
40	37	=== New features in 1.15 ===
41	38

Follow-up revisions

Revision	Commit summary	Author	Date
r49720	* Follow up r49654: move the $extratags and $removetags processing outside of...	skizzerz	21:54, 21 April 2009

Past revisions this follows-up on

Revision	Commit summary	Author	Date
r49330	(bug 12998) Allow <sup>, <sub>, etc. in DISPLAYTITLE	rememberthedot	05:15, 9 April 2009
r49610	Follow-up to r49330: removed $wgRestrictDisplayTitle from DefaultSettings.php	rememberthedot	00:56, 18 April 2009

Comments

#Comment by Remember the dot (talk | contribs) 00:32, 20 April 2009

removeHTMLtags now catches invalid tags in the title <h1>, which is good, but other <h1>s in the page are still not being sanitized. For example:

<h1>
    <ol>
        <li>Evel</li>
        <li>Knievel</li>
    </ol>
</h1>

Also...why are 'ruby', 'rb', 'rt', and 'rp' in the list of bad tags? These are not valid HTML tags, so the parser should automatically escape them without being asked.

#Comment by Skizzerz (talk | contribs) 00:56, 20 April 2009

ruby and the like were inside the Sanitizer method that allows specific HTML tags, and they were on the allowed list. Also, they are indeed valid HTML tags, it's just that many browsers don't support them.

As for the h1 in wiki text, that's an issue with the Parser and not with the displaytitle function, so it's unrelated (although it now should be fixable)

#Comment by Remember the dot (talk | contribs) 01:06, 20 April 2009

The HTML specification (http://www.w3.org/TR/html401/index/elements.html) doesn't list any Ruby tags. These custom tags should not be making it through the sanitizer.

I had thought that the most logical way to do h1 sanitation would be to give removeHTMLtags context information (like "block" or "h1") and have it do the removal as though it had encountered an h1 tag within the page body, perhaps with some additional restrictions. That way, we wouldn't be duplicating the list of tags forbidden in h1 in both the displaytitle function and the removeHTMLtags function.

#Comment by Brion VIBBER (talk | contribs) 20:41, 28 April 2009

http://www.w3.org/TR/ruby/

Currently only MSIE supports it that I know of, which kind of sucks. :)

#Comment by Nikerabbit (talk | contribs) 06:40, 21 April 2009

This breaks a lot of things. See http://translatewiki.net/sandwiki/index.php?title=Portal:Fi

#Comment by Skizzerz (talk | contribs) 21:56, 21 April 2009

Fixed in r49720

Status & tagging log

17:19, 3 June 2009 Tim Starling (talk | contribs) changed the status of r49654 [removed: fixme added: resolved]
06:40, 21 April 2009 Nikerabbit (talk | contribs) changed the status of r49654 [removed: new added: fixme]