r58714 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r58713‎ | r58714 | r58715 >
Date:15:45, 7 November 2009
Author:daniel
Status:ok (Comments)
Tags:
Comment:
better pattern for detecting evil scripts in rdfa attributes
Modified paths:
  • /trunk/phase3/includes/Sanitizer.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/Sanitizer.php
@@ -57,6 +57,11 @@
5858 )?(?=$space|\$)/sx" );
5959
6060 /**
 61+ * Regular expression to match URIs that could trigger script execution
 62+ */
 63+define( 'MW_SCRIPT_URL_PATTERN', '/(^|\s)(javascript|vbscript)[^\w]/i' );
 64+
 65+/**
6166 * List of all named character entities defined in HTML 4.01
6267 * http://www.w3.org/TR/html4/sgml/entities.html
6368 * @private
@@ -631,7 +636,7 @@
632637 $attribute === 'about' || $attribute === 'property' || $attribute === 'resource' ||
633638 $attribute === 'datatype' || $attribute === 'typeof' ) {
634639 //Paranoia. Allow "simple" values but suppress javascript
635 - if ( preg_match( '/(^|\s)javascript\s*:/i', $value ) ) {
 640+ if ( preg_match( MW_SCRIPT_URL_PATTERN, $value ) ) {
636641 continue;
637642 }
638643 }

Comments

#Comment by Simetrical (talk | contribs)   00:01, 8 November 2009

Surely a blacklist-based solution is a bad idea here? Anyone can make up a new protocol that allows evil things to be done if you follow the URL. There's at least one URL protocol that can send an IM to someone without further user intervention if you click a link, IIRC. Shouldn't this use $wgUrlProtocols as a whitelist, if it would be dangerous to allow evil URLs?

On the other hand, are RDFa URLs actually followed, or just used as identifiers? I'm not too familiar with RDF, but my understanding is the latter. In that case, there's no point in a blacklist or whitelist, just allow anything.

#Comment by Duesentrieb (talk | contribs)   00:11, 8 November 2009

the problem with using wgUrlProtocols is that the value may not be a full url. plain values or things using a namespace prefix like foo:bar may be used. it's a bit hard to identify that case.

And yes, in a browser, generally these are only used as identifiers and not followed. but tools that are aware of rdfa may extract them and do who knows what with them. so it seems prudent to at least filter out the most obvioulsy scarry stuff.

#Comment by Tim Starling (talk | contribs)   00:45, 10 February 2010

Unless there's an actual client that runs scripts, I'm marking this as OK for now.

#Comment by Dantman (talk | contribs)   14:52, 14 August 2011

Blacklisting bad urls is completely inadequate for actual security. There are numerous ways to bypass that kind of blacklisting and embed a javascript: url. http://ha.ckers.org/xss.html

If its used out of pure paranoia and not actually part of security then that definition better come with a big fat warning that it should never be used to blacklist in areas where security actually matters.

#Comment by Simetrical (talk | contribs)   15:20, 14 August 2011

The check should really just be removed entirely . . .

Status & tagging log