r61258 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r61257‎ | r61258 | r61259 >
Date:17:23, 19 January 2010
Author:thomasv
Status:reverted (Comments)
Tags:
Comment:
if available, use iconv because it requires less memory (fixes bug 21809) (follow up to r55768 and r55843)
Modified paths:
  • /trunk/phase3/includes/DjVuImage.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/DjVuImage.php
@@ -251,7 +251,13 @@
252252 wfProfileOut( 'djvutxt' );
253253 if( $retval == 0) {
254254 # Get rid of invalid UTF-8, strip control characters
255 - $txt = UtfNormal::cleanUp( $txt );
 255+ if( is_callable( 'iconv' ) ) {
 256+ wfSuppressWarnings();
 257+ $txt = iconv( "UTF-8","UTF-8//IGNORE", $txt );
 258+ wfRestoreWarnings();
 259+ } else {
 260+ $txt = UtfNormal::cleanUp( $txt );
 261+ }
256262 $txt = preg_replace( "/[\013\035\037]/", "", $txt );
257263 $txt = htmlspecialchars($txt);
258264 $txt = preg_replace( "/\((page\s[\d-]*\s[\d-]*\s[\d-]*\s[\d-]*\s*\&quot;([^<]*?)\&quot;\s*|)\)/s", "<PAGE value=\"$2\" />", $txt );

Follow-up revisions

RevisionCommit summaryAuthorDate
r69333* revert r61258...mah17:34, 14 July 2010

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r55768get rid of invalid UTF8, strip control charactersthomasv14:19, 3 September 2009
r55843use UtfNormal::cleanUp() (r55768)thomasv06:45, 5 September 2009

Comments

#Comment by Simetrical (talk | contribs)   16:57, 20 January 2010

If iconv() does the same as UtfNormal::cleanUp() but faster, shouldn't you instead fix UtfNormal::cleanUp() to call it if available? Or is iconv() only a useful substitute in this particular case for some reason?

#Comment by Tim Starling (talk | contribs)   06:26, 28 January 2010

How did you confirm that this will fix bug 21809? Did you get a sysadmin to isolate the problem?

#Comment by ThomasV (talk | contribs)   07:12, 28 January 2010

Simetrical : I am not sure to understand what UtfNormal::cleanUp() does, so I do not know if we can replace it in all situations.

Tim : No, I did not get a sysadmin to tell me what the error message it. The problem was isolated by the reporter of the bug (Philippe Elie). The bug could be reproduced by lowering PHP's memory limit in LocalSettings.php. I tested the fix on my box. The memory limit was exceeded when UtfNormal::cleanUp() is called, not with iconv.

If you want to reproduce the bug you have to use one of the djvu files indicated in the bug report; most djvu files will not cause this. It does not just depend on the size of the text layer, but also on whether the text layer contains invalid utf8.

#Comment by 😂 (talk | contribs)   14:05, 19 February 2010
  1. Does this always work as you intended? What about environments where iconv() is not available and it uses the fallback defined in GlobalFunctions?
  2. Also I agree with Simetrical in the first comment...if this speed increase is generally useful, we should move it to UtfNormal::cleanUp() so it benefits as well.
#Comment by ThomasV (talk | contribs)   14:40, 3 March 2010

I do not know about the iconv fallback. I used iconv() in the first place (see r55768), and brion suggested to use UtfNormal::cleanUp() instead. I guess brion should have known about the fallback.

I am not proposing this change as a speed improvement, but in order to fix a bug (21809) that is quite annoying for a number of contributors.


#Comment by 😂 (talk | contribs)   15:03, 3 March 2010

Then you didn't understand Brion's point back in r55768, nor have you looked in GlobalFunctions where we define our own iconv(). Brion said that this won't work in situations where iconv() is not natively available. By checking like this:

if( is_callable( 'iconv' ) {
...
}

You don't actually check if iconv() is there; it will return true regardless since we defined our own iconv() in GlobalFunctions. This _should_ go back to using UtfNormal::cleanUp() like Brion suggested.

However, if indeed this is a speed improvement for the case when iconv() _does_ exist then UtfNormal::cleanUp() should be patched to use this shortcut instead (then all callers of cleanUp() benefit). And I think more info/better testing would be appropriate, per Tim a few comments up.

#Comment by ThomasV (talk | contribs)   15:21, 3 March 2010

Indeed I did not understand brion's point, because I did not know there was a fallback in GlobalFunctions. Feel free to revert this, and to propose another fix for bug 21809.

I am not able to do any testing at the moment, but I really would love to see this bug fixed, because it is a real problem for users.

Status & tagging log