Time to switch to non-executable file format?
Maybe, although it might still go OOM just reading all the entries into an array. I was wondering if splitting it up into per-language files would work. It'd be slow and ugly, but maybe it would use less memory, since the data from all languages other than the parameter to recache() would be thrown away after each file is read.
I'm not much for splitting up into multiple files; that's exactly what we got rid of a few years ago. Is the total volume of keys an issue, or the number of languages? In any case, with our current loading system, we can fairly easily split the message keys in this file into multiple files. Would the latter resolve the issue?
Yes, that would probably fix it.
Okay. I can do that. Will do it in the next few hours.
If we have good file format standards, can we not have code the parses the PHP i18n files, throwing away the 95% that doesn't matter for any particular language? That way it won't load in the whole thing.
what if license texts were stored as external html page which mediawiki installation on cluster would link to rather than having the license text stored inside, like license.wikimedia.org/en etc
actually why not to make it a wiki page
It's not really license *texts* but just hella lot of license *names* that are in that file.
I always considered this file structured in a brain damaged way.
There are lots of entries like:
This file is licensed under the <license> <country> license
license: <type> <version>
type: Cc-zero | Cc-by | Cc-by-sa
version: 1.0 | 2.0 | 2.5 | 3.0
country: generic + 21 countries
With one message per combination (just in case you may want each license with the phrase in a different order).
So we get ~250 entries per language just for license names. Having 230 languages, we could have up to 60k messages for that
(it wasn't so bad, as not all languages had them translated, in total there were ~23k messages).