r46628 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r46627‎ | r46628 | r46629 >
Date:00:46, 31 January 2009
Author:werdna
Status:reverted (Comments)
Tags:
Comment:
Don't allow padding parser functions to be used to truncate strings. This breaks in some situations, and also encourages its use to reconstruct StringFunctions with core parser functions, which is undesirable for performance and usability reasons.
Modified paths:
  • /trunk/phase3/includes/parser/CoreParserFunctions.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/parser/CoreParserFunctions.php
@@ -333,6 +333,7 @@
334334 static function pad( $string, $length, $padding = '0', $direction = STR_PAD_RIGHT ) {
335335 $lengthOfPadding = mb_strlen( $padding );
336336 if ( $lengthOfPadding == 0 ) return $string;
 337+ if ( $length < mb_strlen( $string ) ) return $string;
337338
338339 # The remaining length to add counts down to 0 as padding is added
339340 $length = min( $length, 500 ) - mb_strlen( $string );

Follow-up revisions

RevisionCommit summaryAuthorDate
r46639Fix r46628 -- I'd misunderstood the nature of the hack. People wanted to appe...werdna11:43, 31 January 2009
r47411Revert r46628, r46639 for now "Don't allow padding parser functions to be use...brion02:24, 18 February 2009

Comments

#Comment by Dragons flight (talk | contribs)   01:11, 31 January 2009

This is not correct. The construction this diff is responding to involves "pad( , $length, 'Some_string' )" where people are pulling $length characters off of $padding and appending them to an empty string in order to truncate whatever was passed in $padding.

I would suggest that:

if( $string == && mb_strlen( $padding ) > 1 ) return $string;

might cause a minimum of breakage.

#Comment by Dragons flight (talk | contribs)   01:13, 31 January 2009

D'oh. The two places where the italics kick in are of course meant to denote empty strings. Silly double quote notation.

#Comment by Happy-melon (talk | contribs)   10:30, 31 January 2009

Or given that we've already calculated that length and stuck it in a varaible:

if( $string == '' && $lengthOfPadding > $length ) return $string;

I maintain that it makes no sense to handle this exception *except* as a knee-jerk reaction to its use in truncation; this essentially removes functionality for "performance" reasons. If those reasons are genuine, surely they'll be being picked up by someone with access to the server load logs, not by a junior dev who saw something on VPT? I'm not trying to put you down here, Werdna, but this strikes me as a knee-jerk reaction that doesn't have a very obvious justification. This change will break pages on en.wikisource, as noted on VPT; it's also used widely enough on en.wiki (see w:Special:WhatLinksHere/Template:Strlen) to be dangerous there. This functionality is used and is useful; I agree that it is a hack and should be fixed, but not before alternative functionality is in place. The NAMESPACE: magic words are a start, but that won't help en.wikisource. We need StringFunctions. Taking away the only approximation without making properly solving the underlying problem strikes me as a little sadistic :D

#Comment by Werdna (talk | contribs)   11:55, 31 January 2009

Yes, I'd apparently misinterpreted how the hack worked – I thought it was basically the reverse, with the string to be truncated being truncated as a consequence of being padded to a length shorter than its actual length. This has been sorted out in r46639. A bit of quick debugging and thinking on my test wiki led to code almost the same as Happy-melon's.

The problem with the template hacks is not necessarily that they are, by themselves, poor for performance (although they do add nontrivial time to parse time), rather that they encourage and permit a complete reimplementation of StringFunctions in template syntax. It was mostly the mention of this that triggered myself and other developers to consider 'nipping it in the bud'.

You've asserted that I'm not qualified to make a determination on something like this. Putting the relevance of such an assertion to one side, the change was suggested by Tim Starling, and that I have the support of at least one other system administrator (Domas Mituzas). Any suggestion that this decision was made by a lone 'junior' developer on a whim is a misunderstanding.

With all of these points aside, the final decision on whether to put this commit live will be made by the CTO Brion Vibber. He is certainly qualified to make such a determination.

#Comment by Happy-melon (talk | contribs)   14:09, 31 January 2009

I'm more saying that you're not in a position to make the decision based on the information you have available: that is, you can't justify a commit on performance reasons when you don't have access to the data that would tell you whether the performance hit was a significant issue. Obviously if this idea comes from Tim, that's a completely different kettle of fish, doubly so from Brion. But there was no reason for me to think otherwise until you mentioned it.

I agree fully with your comment about the issues with template hacks, and as I said above, I agree completely that this issue should be resolved. I'm concerned about the impression that's being created of "ooh wiki users have been doing something naughty, better use our godly powers to stop them", when the focus should be on why the nasty hacks have been implemented. Wiki users haven't gone to such outrageous lengths to hack this functionality purely for the sake of it, they've done it because it is a needed and useful feature that they want access to, and because they've lost patience with waiting for the devs to implement a 'proper' version for them. To now take off my editor hat and put on my uber-junior-dev one, I feel that 'our' focus should be on delivering functionality in the most efficient and progressive way possible, not on closing technical loopholes. One way or another, the code has allowed the implementation of string slicing functionality, which wikis have used to imitate functionality that the devs have been slow in implementing. To add new code with the explicit intention of catching and preventing this behavior is essentially a deliberate functionality regression unless an alternative method is implemented. Wikimedia is not going to explode if this commit waits until StringFunctions are either installed as an extension or integrated into core.

#Comment by Remember the dot (talk | contribs)   18:47, 31 January 2009

Using padright to grab the first n characters of a string adds only a little more overhead than using a regular substring function. The issue seems to be that it's possible to do stuff like this:

Template:Charat
{{charat guts|{{padright:|{{#expr:{{{2}}}+1}}|{{{1}}}}}|{{padright:|{{{2}}}|{{{1}}}}}}}
Template:Charat guts
{{#switch:{{{1}}}
|{{{2}}}a=a
|{{{2}}}b=b
|{{{2}}}c=c
}}

While this may seem useful on the surface, it quickly becomes apparent that a template like this is impractical. It breaks with whitespace, pipe characters, and equals signs, which eliminates a wide variety of uses right off the bat. And you'd have to code in every allowed character. A template that handled all UTF-8 characters allowed in XML would take up more than 7 MB -- I'm not even sure you can make a page that big.

In other words, we don't even want to do the thing you're so afraid that we'll do. And if you're concerned about malicious users, there's far worse things they could do, like insert {{#time:j F|+1000000000000 days}} into a widely-used template.

#Comment by Remember the dot (talk | contribs)   07:27, 2 February 2009

Well, I suppose the problems with pipes and equals signs can be resolved with {{!}} and {{=}}, and the whitespace issue would probably just mean that all whitespace would be transformed to &#32;. An unrecognized character could also be detected with a combination of <span class="error"> and #iferror. So I understand if you want to kill this. However, by doing so you will also kill the ability to extract just the first character from a string, which is badly needed for wikisource:Template:Author. Currently, the template is not very usable: users must manually input the first character of the author's last name as a separate parameter. So don't think that by taking away the ability to automate this you're helping usability in the slightest.

#Comment by Brion VIBBER (talk | contribs)   02:41, 18 February 2009

Reverted this for now in r47411.

Status & tagging log