r111606 MediaWiki - Code Review archive

Repository:	MediaWiki
Revision:	< r111605‎ \| r111606 \| r111607 >
Date:	01:26, 16 February 2012
Author:	tstarling
Status:	ok (Comments)
Tags:
Comment:	Remove everything except english from WikimediaLicenseTexts.i18n.php, to avoid OOM
Modified paths:	/branches/wmf/1.19wmf1/extensions/WikimediaMessages/WikimediaLicenseTexts.i18n.php (modified) (history)

Diff [purge]

The diff is too large to display.

Follow-up revisions

Revision	Commit summary	Author	Date
r111713	Split up the 4.7MB WikimediaLicenseTexts.i18n.php into two files to address O...	siebrand	01:48, 17 February 2012

Comments

#Comment by Nikerabbit (talk | contribs) 07:09, 16 February 2012

Time to switch to non-executable file format?

#Comment by Tim Starling (talk | contribs) 07:19, 16 February 2012

Maybe, although it might still go OOM just reading all the entries into an array. I was wondering if splitting it up into per-language files would work. It'd be slow and ugly, but maybe it would use less memory, since the data from all languages other than the parameter to recache() would be thrown away after each file is read.

#Comment by Siebrand (talk | contribs) 21:56, 16 February 2012

I'm not much for splitting up into multiple files; that's exactly what we got rid of a few years ago. Is the total volume of keys an issue, or the number of languages? In any case, with our current loading system, we can fairly easily split the message keys in this file into multiple files. Would the latter resolve the issue?

#Comment by Tim Starling (talk | contribs) 23:45, 16 February 2012

Yes, that would probably fix it.

#Comment by Siebrand (talk | contribs) 00:14, 17 February 2012

Okay. I can do that. Will do it in the next few hours.

#Comment by Aaron Schulz (talk | contribs) 22:00, 16 February 2012

If we have good file format standards, can we not have code the parses the PHP i18n files, throwing away the 95% that doesn't matter for any particular language? That way it won't load in the whole thing.

#Comment by Petrb (talk | contribs) 13:40, 16 February 2012

what if license texts were stored as external html page which mediawiki installation on cluster would link to rather than having the license text stored inside, like license.wikimedia.org/en etc

#Comment by Petrb (talk | contribs) 13:41, 16 February 2012

actually why not to make it a wiki page

#Comment by Nikerabbit (talk | contribs) 17:19, 16 February 2012

It's not really license *texts* but just hella lot of license *names* that are in that file.

#Comment by Platonides (talk | contribs) 22:32, 16 February 2012

I always considered this file structured in a brain damaged way.

There are lots of entries like:

This file is licensed under the <license> <country> license

license: <type> <version>

type: Cc-zero | Cc-by | Cc-by-sa

version: 1.0 | 2.0 | 2.5 | 3.0

country: generic + 21 countries

With one message per combination (just in case you may want each license with the phrase in a different order).

So we get ~250 entries per language just for license names. Having 230 languages, we could have up to 60k messages for that

(it wasn't so bad, as not all languages had them translated, in total there were ~23k messages).

Status & tagging log

19:49, 17 February 2012 Awjrichards (talk | contribs) changed the status of r111606 [removed: new added: ok]