r111263 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r111262‎ | r111263 | r111264 >
Date:21:33, 11 February 2012
Author:wikinaut
Status:deferred (Comments)
Tags:
Comment:
initial import of new extension EtherpadLite
Modified paths:
  • /trunk/extensions/EtherpadLite (added) (history)
  • /trunk/extensions/EtherpadLite/EtherpadLite.i18n.php (added) (history)
  • /trunk/extensions/EtherpadLite/EtherpadLite.php (added) (history)

Diff [purge]

Index: trunk/extensions/EtherpadLite/EtherpadLite.php
@@ -0,0 +1,135 @@
 2+<?php
 3+/**
 4+ * EtherpadLite extension
 5+ *
 6+ * @file
 7+ * @ingroup Extensions
 8+ *
 9+ * The extension adds a tag "eplite" to the MediaWiki parser and
 10+ * provides a method to embed Etherpad Lite pads on MediaWiki pages.
 11+ * The Etherpad Lite server is not part of the extension.
 12+ *
 13+ * Usage:
 14+ *
 15+ * <eplite pad-id="padid" />
 16+ * <eplite pad-id="myPseudoSecretPadHash-7ujHvhq06g" />
 17+ * <eplite pad-id="padid" height="200px" width="600px" />
 18+ *
 19+ * Installation:
 20+ *
 21+ * Add the following lines in LocalSettings.php:
 22+ *
 23+ * require_once( "$IP/extensions/EtherpadLite/EtherpadLite.php" );
 24+ * $wgEtherpadLiteDefaultPadUrl = "http://www.your-pad-server.org/p/";
 25+ * $wgEtherpadLiteDefaultWidth = "600px";
 26+ * $wgEtherpadLiteDefaultHeigth = "400px";
 27+ *
 28+ * Prerequisite:
 29+ *
 30+ * Etherpad Lite host server (example)
 31+ * $wgEtherpadLiteDefaultPadUrl = "http://www.example.com/p/";
 32+ *
 33+ * For setting up your own Etherpad Lite server (based on node.js) see
 34+ * Etherpad Lite homepage https://github.com/Pita/etherpad-lite
 35+ *
 36+ * This extension is based on:
 37+ *
 38+ * https://github.com/johnyma22/etherpad-lite-jquery-plugin
 39+ * https://github.com/Pita/etherpad-lite/wiki/Embed-Parameters
 40+ *
 41+ * The present MediaWiki extension does not require jquery. It adds an iframe.
 42+ *
 43+ * @author Thomas Gries
 44+ * @license GPL v2
 45+ * @license MIT
 46+ *
 47+ * Dual licensed under the MIT and GPL licenses:
 48+ * http://www.opensource.org/licenses/mit-license.php
 49+ * http://www.gnu.org/licenses/gpl.html
 50+ *
 51+ */
 52+
 53+// Check environment
 54+if ( !defined( 'MEDIAWIKI' ) ) {
 55+ echo( "This is an extension to MediaWiki and cannot be run standalone.\n" );
 56+ die( - 1 );
 57+}
 58+
 59+// Credits
 60+$wgExtensionCredits['other'][] = array(
 61+ 'path' => __FILE__,
 62+ 'name' => 'EtherpadLite',
 63+ 'author' => array( 'Thomas Gries' ),
 64+ 'version' => '1.00 20120211',
 65+ 'url' => '//www.mediawiki.org/wiki/Extension:EtherpadLite',
 66+ 'descriptionmsg' => 'etherpadlite-desc',
 67+);
 68+
 69+$dir = dirname( __FILE__ ) . '/';
 70+
 71+$wgExtensionMessagesFiles['EtherpadLite'] = $dir . 'EtherpadLite.i18n.php';
 72+$wgHooks['ParserFirstCallInit'][] = 'efEtherpadLiteParser_Initialize';
 73+
 74+// https://www.mediawiki.org/wiki/Manual:Tag_extensions
 75+function efEtherpadLiteParser_Initialize( &$parser ) {
 76+ $parser->setHook('eplite', 'efEtherpadLiteParserFunction_Render');
 77+ return true;
 78+}
 79+
 80+# Define a default Etherpad Lite server Url and base path
 81+# this server is used unless a distinct server is defined by pad-id="..."
 82+$wgEtherpadLiteDefaultPadUrl = "http://www.example.com/p/";
 83+$wgEtherpadLiteDefaultWidth = "300px";
 84+$wgEtherpadLiteDefaultHeigth = "200px";
 85+$wgEtherpadLiteMonospacedFont = 'false';
 86+$wgEtherpadLiteShowControls = 'true';
 87+$wgEtherpadLiteShowLineNumbers = 'true';
 88+$wgEtherpadLiteShowChat = 'true';
 89+$wgEtherpadLiteShowAuthorColors = 'true';
 90+
 91+function efEtherpadLiteParserFunction_Render( $input, $args, $parser, $frame ) {
 92+
 93+ global $wgUser;
 94+ global $wgEtherpadLiteDefaultPadUrl,$wgEtherpadLiteDefaultWidth,
 95+ $wgEtherpadLiteDefaultHeigth,$wgEtherpadLiteMonospacedFont,$wgEtherpadLiteShowControls,
 96+ $wgEtherpadLiteShowLineNumbers,$wgEtherpadLiteShowChat,$wgEtherpadLiteUserName,
 97+ $wgEtherpadLiteShowAuthorColors;
 98+
 99+ $padId = ( !empty( $args['pad-id'] ) ) ? $args['pad-id'] : "" ;
 100+ $height = ( !empty( $args['height'] ) ) ? $args['height'] : $wgEtherpadLiteDefaultHeight;
 101+ $width = ( !empty( $args['width'] ) ) ? $args['width'] : $wgEtherpadLiteDefaultWidth;
 102+
 103+ $useMonospaceFont = ( !empty( $args['monospaced-font'] ) ) ? $args['monospaced-font'] : $wgEtherpadLiteMonospacedFont;
 104+ $showControls = ( !empty( $args['show-controls'] ) ) ? $args['show-controls'] : $wgEtherpadLiteShowControls;
 105+ $showLineNumbers = ( !empty( $args['show-linenumbers'] ) ) ? $args['show-linenumbers'] : $wgEtherpadLiteShowLineNumbers;
 106+ $showChat = ( !empty( $args['show-chat'] ) ) ? $args['show-chat'] : $wgEtherpadLiteShowChat;
 107+ $noColors = ! ( ( !empty( $args['show-colors'] ) ) ? $args['show-colors'] : $wgEtherpadLiteShowAuthorColors );
 108+
 109+ $epliteHostUrl = Sanitizer::cleanUrl (
 110+ ( !empty( $args['pad-url'] ) ) ? $args['pad-url'] : $wgEtherpadLiteDefaultPadUrl
 111+ );
 112+
 113+ // preset the pad username from MediaWiki username or IP
 114+ // attention:
 115+ // the pad username can currently be overwritten when editing the pad
 116+
 117+ $userName = $wgUser->getName();
 118+
 119+ $iframeAttributes = array(
 120+ "style" => "width:$width;height:$height",
 121+ "id" => "epframe$padId",
 122+ "src" => "$epliteHostUrl/$padId" .
 123+ "?showControls=$showControls" .
 124+ "&showChat=$showChat" .
 125+ "&showLineNumbers=$showLineNumbers" .
 126+ "&useMonospaceFont=$useMonospaceFont" .
 127+ "&userName=$userName" .
 128+ "&noColors=$noColors"
 129+ );
 130+
 131+ $output = Html::rawElement( 'iframe', $iframeAttributes );
 132+
 133+ wfDebug( "EtherpadLite:efEtherpadLiteParserFunction_Render $output\n" );
 134+ return array( $output, 'noparse' => true, 'isHTML' => true );
 135+
 136+}
Property changes on: trunk/extensions/EtherpadLite/EtherpadLite.php
___________________________________________________________________
Added: svn:mime-type
1137 + text/x-php
Added: svn:keywords
2138 + Author Date Id Rev URL
Added: svn:eol-style
3139 + native
Index: trunk/extensions/EtherpadLite/EtherpadLite.i18n.php
@@ -0,0 +1,16 @@
 2+<?php
 3+/**
 4+ * Internationalisation file for extension EtherpadLite
 5+ *
 6+ * @file
 7+ * @ingroup Extensions
 8+ */
 9+
 10+$messages = array();
 11+
 12+/** English
 13+ * @author Thomas Gries
 14+ */
 15+$messages['en'] = array(
 16+ 'etherpadlite-desc' => 'Provides a method to embed one or many Etherpad Lite pads (which are hosted on local or external Etherpad Lite server/s) on MediaWiki pages. It adds an &lt;eplite&gt; parser tag.',
 17+);
Property changes on: trunk/extensions/EtherpadLite/EtherpadLite.i18n.php
___________________________________________________________________
Added: svn:mime-type
118 + text/x-php
Added: svn:keywords
219 + Author Date Id Rev URL
Added: svn:eol-style
320 + native

Follow-up revisions

RevisionCommit summaryAuthorDate
r111284first follow-up r111263 . fixing issue 1 &, 2 empty, 4 'true', 7 (same padid=...wikinaut07:42, 12 February 2012
r111285follow-up r111284 r111263 . wfEtherpadLiteStringFromTestedBoolean now returns...wikinaut08:03, 12 February 2012
r111416r111263: Consistency tweaks in preparation for adding extension to translatew...raymond20:31, 13 February 2012
r111417r111263: Register extension for translatewiki.net.raymond20:32, 13 February 2012

Comments

#Comment by Bawolff (talk | contribs)   00:26, 12 February 2012

Some comments:

function efEtherpadLiteParser_Initialize( &$parser )

Doing the & in &$parser isn't needed anymore as far as I am aware. (OTOH, it doesn't hurt anything either)

The empty function is generally discouraged in the coding conventions.

$wgEtherpadLiteDefaultPadUrl    = "http://www.example.com/p/";

One alternative possibility is to do $wgEtherpadLiteDefaultPadUrl = null instead as default, and when you're about to display the pad, check for null and yell at user to configure the extension if the variable is null.

$wgEtherpadLiteShowControls     = 'true';
...

I'm not really a fan of using a string true. It would be better to use a boolean true value, and then convert that to a string when making the iframe url parameters.

 $padId            = ( !empty( $args['pad-id'] ) ) ? $args['pad-id'] : "" ;
 ...

There's no validation of user input here. I'm not sure how much evil can be accomplished via the url parameters. Worst case scenario would be of course if etherpad had an xss vulnrability in terms of its url parameters (Of course you have to allow most input here because pads can be named everything, but should at least check for & since those separate url parameters.

 ( !empty( $args['pad-url'] ) ) ? $args['pad-url'] : $wgEtherpadLiteDefaultPadUrl

this is kind of scary, an evil user can then embed an iframe to an arbitrary site. This could perhaps be used to do some kind of clickjacking attack against some other site, track users of a wiki, other evil stuff etc.

This is also an XSS attack by doing something like <eplite pad-url="javascript:alert(1)//"/>

+		"id"         => "epframe$padId",

Minor issue, but user could potentially make an id that is not a valid one. (HTML has some restrictions on ids if i recall). Also what if someone embeds the same pad twice (ids must be unique)?

$userName  = $wgUser->getName();

No garuntee the user who parsed the page is the one viewing the page. Either need to call $parser->disableCache(), add the username process at some later stage (like use js, or possibly some hook that runs right before outputting the page).

"style"      => "width:$width;height:$height",

CSS needs to be sanitized (There's a method somewhere in the Sanitizer for that i think). Certain browsers can be caused to execute js via certain css rules.

return array( $output, 'noparse' => true, 'isHTML' => true );

That's the return value format for parser functions not parser tags. Parser tags you just return the html as a string. (There is an extended return value system for parser tags where you can specify markerType for example, but the element names of the array are different from the parser function return value)

$wgExtensionCredits['other'][]

Generally this would probably be considered a "hook" extension not an "other" extension.

Cheers.

p.s. You should perhaps coordinate with NeilK. If i recall he was interested in etherpad <-> mediawiki stuff at one point.

#Comment by Bawolff (talk | contribs)   00:32, 12 February 2012

Also two other things:

Might be nicer to use wfAppendQuery() instead of constructing the url with string concatenation, and for the boolean parameters (!empty( $args['show-controls'] ) )...) it should definitly validate that they are either true or false.

#Comment by Wikinaut (talk | contribs)   01:16, 12 February 2012

Thanks for your deep analysism, which I appreciate very much.

As you might have noticed, I already use Sanitizer::cleanUrl and Html:rawElement methods as mentioned in the Security Guides.

Regarding the pad-id, and the query string parameters - which are then forming part of the url - I now think of sanitizing the whole "src" parameter like in this example

"src" => Sanitzer::cleanUrl("$epliteHostUrl/$padId" . "?showControls=$showControls" . "&showChat=$showChat" . "&showLineNumbers=$showLineNumbers" . "&useMonospaceFont=$useMonospaceFont" . "&userName=$userName" . "&noColors=$noColors"),


If this works, will it fulfill the mw requirements for a sane code ?

The Html:rawElement was mentioned as a good way to sanitize here https://www.mediawiki.org/wiki/Security_checklist_for_developers#Output_.28API.2C_CSS.2C_JavaScript.2C_HTML.2C_XML.2C_etc..29 , so I thought, it is enough to use this method.

Pls. correct me, if I am wrong.

#Comment by Bawolff (talk | contribs)   02:55, 12 February 2012
The Html:rawElement was mentioned as a good way to sanitize here [https://www.mediawiki.org/wiki/Security_checklist_for_developers#Output_.28API.2C_CSS.2C_JavaScript.2C_HTML.2C_XML.2C_etc..29 https://www.mediawiki.org/wiki/Security_checklist_for_developers#Output_.28API.2C_CSS.2C_JavaScript.2C_HTML.2C_XML.2C_etc..29] , so I thought, it is enough to use this method. 

That page was misleading (I changed it). Html::rawElement makes sure your attributes are encoded (So people don't do crap like width='foo" ><script>doEvil()...'. They don't look at the actual contents of the attribute. Hence, style attributes have to be further filtered with Sanitizer::checkCss (or Sanitizer::validateAttributes whic will do stuff for some other attributes in addition to style ) if the CSS comes from the user.


Sanitizer::cleanUrl doesn't do a whole lot for security in the context of your extension (It basically just normalizes entity references to stop people from getting around spam blacklists, which to be honest I'm not even sure spam blacklists would be checked for a parser hook attribute). The most scary part is allowing users to override the pad url - embedding iframe to a potentially evil site opens up users to at the very least phising attacks (Because it's embedded into the page, looks like part of the page, evil person could set up a website asking for users passwords, etc.

wfAppendUrl would probably help somewhat with security of building the url parameters, since it would prevent someone from passing a parameter to the other server other then the one's that you explicitly want them to (Of course in general, this extension is working on the assumption that the etherpad server is not evil, so the url paramter validation is not super-critical...). The best thing though would to check what all the valid values for those parameters could be, and make sure the user cannot pass in anything but those.


p.s. I just noticed there is a typo:

$wgEtherpadLiteDefaultHeigth

Should be

$wgEtherpadLiteDefaultHeight

Cheers.

#Comment by Wikinaut (talk | contribs)   06:47, 12 February 2012

Le me answer the uncritical question "what if someone embeds the same pad twice (ids must be unique)? ".

A: This is not a problem; the client then connects to both pads, i.e. you have two windows and can type in one of the two and see the output in both; it was my first test when I started embedding epl in mw.

#Comment by Bawolff (talk | contribs)   20:44, 12 February 2012

The issue I'm concerned about is this results in invalid html, since the same id cannot appear twice in one page. So if someone embeds two pads in the same page, there will be two iframes in the page, both with the id eplite-iframe-, which will cause the html not to be valid. Since you're not using the id attribute for anything, it might make sense just not to put an id on the iframes. (From a pragmatic perspective, the invalid html won't hurt anything, it will just cause the validator to whine)

#Comment by Wikinaut (talk | contribs)   20:53, 12 February 2012

"...since the same id cannot appear twice in one page."

Of course, you are 100% right, I overlooked this (never ignored it, but today.) Will use, as suggested by you, "class=" instead.

"Since you're not using the id attribute for anything":

This is just for the case, that another extension will have the need to manipulate that specific frame. I took it over from the example "Etherpad lite and jQuery" https://github.com/johnyma22/etherpad-lite-jquery-plugin/blob/master/js/etherpad.js where the content also can be read. I did not implement this part in the current version.

#Comment by Wikinaut (talk | contribs)   20:58, 12 February 2012

I filed this as issue on https://github.com/johnyma22/etherpad-lite-jquery-plugin/issues/8 so that the epl people are informed to also replace id= by class= in their etherpad.js example.

#Comment by P858snake (talk | contribs)   06:50, 12 February 2012
+	'url' => '//www.mediawiki.org/wiki/Extension:EtherpadLite',

Should be a direct HTTPS link based on the standards we seem to be using now.

#Comment by Wikinaut (talk | contribs)   07:07, 12 February 2012

// ==> https:// Are you really sure, I thought we have to use "//" ?

#Comment by P858snake (talk | contribs)   07:36, 12 February 2012

Yes, Brion went thought and changed them awhile ago, Because otherwise it's basically pointless (not many people seem to run their wikis on Https)

#Comment by Wikinaut (talk | contribs)   07:05, 12 February 2012

Re: "Doing the & in &$parser isn't needed anymore as far as I am aware. (OTOH, it doesn't hurt anything either) "

I verbatim copied that "&" from https://www.mediawiki.org/wiki/Manual:Tag_extensions - wanted to fix this now but found luckily: you did already, thanks.

#Comment by Wikinaut (talk | contribs)   19:05, 12 February 2012

@Bawolff: in r111313 I fixed (I think) all of the issues.

Let me explain here: 1. I changed the attribute names from pad-id => id and pad-url => src. This allows to treat (id, src, height, width) as standard attributes and to use

$sanitizedAttributes = Sanitizer::validateAttributes( $args, array ( "width", "height", "id", "src" ) );

for this, and then to use $santizedAttributes['src'] for example.

The boolean parameters are manually checked.

Please have a look to r111313 . Documentation https://www.mediawiki.org/wiki/Extension:EtherpadLite has been updated as well.

#Comment by Bawolff (talk | contribs)   19:54, 12 February 2012

You're getting closer, but there's still some issues. (From r111313)

+	
+	$sanitizedAttributes = Sanitizer::validateAttributes( $args, array ( "width", "height", "id", "src" ) );

You really shouldn't use validateAttributes for the individual width and height elements, unless you put them on an iframe as the old-style indvidual width and height attributes. (This is because the html width attribute is interpreted differently then the style attribute by web browsers, so different things are evil (or really, its only possible to do evil things with the style attribute...)). If you're going to use validateAttributes, run it on the fully constructed array of attributes ( $iframeAttributes ) so the Sanitizer see's exactly what attributes you're going to pass to Html::rawElement.


Some other minor issues:

$noColors         = ! ( wfEtherpadLiteStringFromTestedBoolean( $args['show-colors'], $wgEtherpadLiteShowAuthorColors ) );

The ! mark operator working on a string result of wfEtherpadLiteStringFromTestedBoolean might do different things then what you expect. (Since (!"false") === false

$args['id']       = ( isset( $args['id'] ) ) ? $args['id'] : "";

Any particular reason why you set an id attribute at all (instead of say a class attribute that could be non-unique and the same for all the iframes). Getting users to ensure the id is always unique is probably not ideal

I'm still very nervous about allowing just any website as a source. The potential bad is as following (Even with nowiki, CR messes with my urls... pretend that is not using wiki external url syntax):

<eplite src="[http://example.com/website-that-tells-user-they-have-been-logged-out-and-asks-for-password http://example.com/website-that-tells-user-they-have-been-logged-out-and-asks-for-password]" width="200px;border:none;overflow:hidden"/>

It's a lot less convincing if width is validated fully (You might want to consider also stripping semi-colons from width in addition to Sanitizer related stuff), but still, that could easily trick gullible users to revealing there password to some third party. If you really want to support multiple etherpad sites, perhaps the best way would be to have another config variable that has a list of alternative allowed etherpad websites in addition to the default.

Last of all, some of the checks for various args (introduced in r111313) need to use isset to prevent Notice: Undefined index: show-colors in /var/www/w/extensions/EtherpadLite/EtherpadLite.php on line 116.

Cheers.

#Comment by Wikinaut (talk | contribs)   19:59, 12 February 2012

"I'm still very nervous about allowing just any website as a source."

I was and I am also. Do you have a proposal for me ?

#Comment by Wikinaut (talk | contribs)   20:01, 12 February 2012

the "id" replaced my original "pad-id" which is the unique identifier on a pad-server (src=pad-server url). "id" is not a class.

#Comment by Bawolff (talk | contribs)   20:52, 12 February 2012

I mean the id for the iframe. At the moment, the iframe id is set to the same thing as the pad id (earlier i didn't realize that that value was also used for what pad to get, which is why my comment is a little confusing. I thought it was solely for setting the id attribtue on the iframe). id's on html elements have to be unique (in a page), so it might be best just not to put an id attribute on the iframe. (I mentioned the class attribute, because typically classes are used for the same purpose as the id attribute, but don't have the uniqueness requirement).

#Comment by Wikinaut (talk | contribs)   20:07, 12 February 2012

"Last of all, some of the checks for various args (introduced in r111313) need to use isset to prevent Notice: Undefined index: show-colors "

Arrrghh. mea culpa. I put the isset() into the function where I passed the "unset value" to - which was so stupid (shame on me).

#Comment by Hashar (talk | contribs)   19:44, 12 February 2012

That is a good extension idea Wikinaut. Well done 8-)

#Comment by Wikinaut (talk | contribs)   20:03, 12 February 2012

Hashar, perhaps you can contact me to help me to solve the remaining issues, which I fully understand. Unfortunately, I lost the overview, what MW methods can be used best for sanitizing.

Any help is welcome, either here or by mail (you have my e-mail address)

Status & tagging log