r102205 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r102204‎ | r102205 | r102206 >
Date:23:02, 6 November 2011
Author:krinkle
Status:resolved (Comments)
Tags:
Comment:
[TsIntuition] Minor fixes in acceptableLanguages()
* Adding an example of the static utility function to demo/demo6
* Moving the function out of the TsIntuition class into TsIntuitionUtil where the other static utility functions are, renaming to getAcceptableLanguages
* Improving documentation/variable naming a little bit
* Whitespace / curly braces fixes
* Although I'm not 100% sure about this, I've added a FIXME about the q-val defaulting to 1. It needs a look-ahead technique to be more solid, right now low-level accept-languages are getting too high. Example:
-- code
getAcceptableLanguages: ( 'nl-be,nl;q=0.7,en-us,en;q=0.3' ):
array(4) {
["nl-be"]=>
string(1) "1" // should be 0.7
["en-us"]=>
string(1) "1" // should be 0.3
["nl"]=>
string(3) "0.7"
["en"]=>
string(3) "0.3"
}
-- /code
See demo6 for more this in action

* Follows-up r100234
Modified paths:
  • /trunk/tools/ToolserverI18N/TsIntuition.php (modified) (history)
  • /trunk/tools/ToolserverI18N/TsIntuitionUtil.php (modified) (history)
  • /trunk/tools/ToolserverI18N/public_html/demo/demo6.php (modified) (history)

Diff [purge]

Index: trunk/tools/ToolserverI18N/TsIntuition.php
@@ -1217,42 +1217,6 @@
12181218 }
12191219
12201220 /**
1221 - * Return a list of acceptable languages from an Accept-Language header
1222 - * @param $acceptLanguage String List of language tags, as given in
1223 - * http Accept-Language header (omit to fetch from $_SERVER['HTTP_ACCEPT_LANGUAGE'])
1224 - * @return array sorted with the candidate languages as keys and q-values asvalues.
1225 - */
1226 - static function acceptableLanguages($acceptLanguage = false) {
1227 - if ( $acceptLanguage === false ) {
1228 - $acceptLanguage = @$_SERVER['HTTP_ACCEPT_LANGUAGE'];
1229 - }
1230 -
1231 - $acceptableLanguages = array();
1232 -
1233 - //Accept-Language: 1#( language-range [ ";" "q" "=" qvalue ] )
1234 - //The list of elements is separated by comma and optional LWS
1235 - $languages = explode( ',', $acceptLanguage );
1236 - foreach ( $languages as $language ) {
1237 - $language = trim( $language ); // Remove optional LWS
1238 -
1239 - // Extract the language-range and q-value
1240 - if ( !preg_match( '/^([A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*|\*)(?:\s*;\s*q\s*=\s*([01](?:\.[0-9]{0,3})?))?$/', $language, $m ) )
1241 - continue;
1242 -
1243 - // We are not interested in the total match.
1244 - array_shift( $m );
1245 - $m[] = 1; // Default q-value is 1
1246 - list( $languageRange, $qvalue ) = $m;
1247 -
1248 - $acceptableLanguages[$languageRange] = $qvalue;
1249 - }
1250 -
1251 - arsort( $acceptableLanguages, SORT_NUMERIC ); // This is not an stable sort, but it isn't needed
1252 -
1253 - return $acceptableLanguages;
1254 - }
1255 -
1256 - /**
12571221 * Check language choice tree in the following order:
12581222 * - First: Construct override
12591223 * - Second: Parameter override
@@ -1266,46 +1230,54 @@
12671231 */
12681232 private function initLangSelect( $option ) {
12691233 $set = false;
 1234+
12701235 if ( isset( $option ) && !empty( $option ) ) {
12711236 $set = $this->setLang( $option );
12721237 }
 1238+
12731239 if ( !$set && $this->getUseRequestParam() === true && isset( $_GET[ $this->paramNames['userlang'] ] ) ) {
12741240 $set = $this->setLang( $_GET[ $this->paramNames['userlang'] ] );
12751241 }
 1242+
12761243 if ( !$set && isset( $_COOKIE[ $this->cookieNames['userlang'] ] ) ) {
12771244 $set = $this->setLang( $_COOKIE[ $this->cookieNames['userlang'] ] );
12781245 }
12791246
12801247 if ( !$set ) {
1281 - $acceptableLanguages = self::acceptableLanguages();
1282 - foreach ( $acceptableLanguages as $lang => $q ) {
1283 -
1284 - if ( $lang == '*' ) {
1285 - /* We choose the first available language which is not in $acceptableLanguages
1286 - * The special * range matches every tag not matched by any other range, languages
1287 - * present in $acceptableLanguages will either have a lower q-value, or be missing
1288 - * from availableLanguages.
1289 - * The order will be the one in the i18n file: en, af, ar...
 1248+ $acceptableLanguages = TsIntuitionUtil::GetAcceptableLanguages();
 1249+ foreach ( $acceptableLanguages as $acceptLang => $qVal ) {
 1250+
 1251+ if ( $acceptLang == '*' ) {
 1252+
 1253+ /**
 1254+ * We pick the first available language which is not in $acceptableLanguages.
 1255+ * The special * range matches every tag not matched by any other range.
 1256+ * Other language codes in $acceptableLanguages will either have a lower q-value,
 1257+ * or be missing from availableLanguages.
 1258+ * The order will be the one in the i18n file: en, af, ar...
12901259 */
1291 -
1292 - foreach ( $this->availableLanguages as $lang => $true ) {
1293 - if (! isset( $acceptableLanguages[$lang] ) ) {
1294 - $set = $this->setLang( $lang );
 1260+
 1261+ foreach ( $this->availableLanguages as $availableLang => $true ) {
 1262+ if ( !isset( $acceptableLanguages[$availableLang] ) ) {
 1263+ $set = $this->setLang( $availableLangf );
12951264 break;
12961265 }
12971266 }
1298 - if ( $set )
 1267+ if ( $set ) {
12991268 break;
1300 - } elseif ( isset( $this->availableLanguages[$lang] ) ) {
1301 - $set = $this->setLang( $lang );
 1269+ }
 1270+
 1271+ } elseif ( isset( $this->availableLanguages[$acceptLang] ) ) {
 1272+ $set = $this->setLang( $acceptLang );
13021273 break;
13031274 }
13041275 }
13051276 }
1306 -
 1277+
13071278 if ( !$set ) {
13081279 $set = $this->setLang( 'en' );
13091280 }
 1281+
13101282 return $set;
13111283 }
13121284
Index: trunk/tools/ToolserverI18N/public_html/demo/demo6.php
@@ -50,7 +50,23 @@
5151
5252 );
5353
 54+// GetAcceptableLanguages
 55+echo "<br />getAcceptableLanguages: (default: \$_SERVER['HTTP_ACCEPT_LANGUAGE']: {$_SERVER['HTTP_ACCEPT_LANGUAGE']}):<br />";
 56+var_dump(
5457
 58+ TsIntuitionUtil::getAcceptableLanguages( @$_SERVER['HTTP_ACCEPT_LANGUAGE'] )
 59+
 60+);
 61+
 62+$acceptLang = 'nl-be,nl;q=0.7,en-us,en;q=0.3';
 63+echo "<br />getAcceptableLanguages: ( '{$acceptLang}' ):<br />";
 64+var_dump(
 65+
 66+ TsIntuitionUtil::getAcceptableLanguages( $acceptLang )
 67+
 68+);
 69+
 70+
5571 /* View source */
5672 view_source( __FILE__ );
5773 close_demo();
\ No newline at end of file
Index: trunk/tools/ToolserverI18N/TsIntuitionUtil.php
@@ -123,4 +123,57 @@
124124 }
125125 }
126126
 127+ /**
 128+ * Return a list of acceptable languages from an Accept-Language header
 129+ * @param $rawList String List of language tags, formatted like an
 130+ * HTTP Accept-Language header (optional; defaults to $_SERVER['HTTP_ACCEPT_LANGUAGE'])
 131+ * @return array keyed by language codes with q-values as values.
 132+ */
 133+ public static function getAcceptableLanguages( $rawList = false ) {
 134+ if ( $rawList === false ) {
 135+ $rawList = @$_SERVER['HTTP_ACCEPT_LANGUAGE'];
 136+ }
 137+
 138+ $acceptableLanguages = array();
 139+
 140+ // Accept-Language: 1#( language-range [ ";" "q" "=" qvalue ] )
 141+ // Example: "nl-be,nl;q=0.7,en-us,en;q=0.3"
 142+ // The list of elements is separated by comma and optional LWS
 143+ $languages = explode( ',', $rawList );
 144+ foreach ( $languages as $language ) {
 145+ $language = trim( $language ); // Remove optional LWS
 146+
 147+ // Extract the language-range and, if present, the q-value
 148+ if ( !preg_match( '/^([A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*|\*)(?:\s*;\s*q\s*=\s*([01](?:\.[0-9]{0,3})?))?$/', $language, $m )
 149+ ) {
 150+ continue;
 151+ }
 152+
 153+ /**
 154+ * $m is now an array with either two or three values:
 155+ * - array( 'lang-code', 'lang-code' )
 156+ * - array( 'lang-code;q=val', 'lang-code', 'val' )
 157+ */
 158+
 159+ // We are not interested in the first value.
 160+ array_shift( $m );
 161+
 162+ // Default to 1 as q-val
 163+ // @FIXME: In case "nl-be,nl;q=0.7,en-us,en;q=0.3", "en" gets defaulted to '1',
 164+ // it should default to the next q-val (0.3 in this case)
 165+ if ( !isset( $m[1] ) ) {
 166+ $m[1] = '1';
 167+ }
 168+
 169+ list( $langCode, $qVal ) = $m;
 170+
 171+ $acceptableLanguages[$langCode] = $qVal;
 172+ }
 173+
 174+ // Sort by q value in descending order
 175+ arsort( $acceptableLanguages, SORT_NUMERIC );
 176+
 177+ return $acceptableLanguages;
 178+ }
 179+
127180 }

Follow-up revisions

RevisionCommit summaryAuthorDate
r102347htmlescape user-provided HTTP_ACCEPT_LANGUAGE (although unlikely to cause any...platonides23:56, 7 November 2011
r102453[TsIntuition] rm fixme-comment about q-val default per r102205 CRkrinkle21:44, 8 November 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r100234Guess the language from Accept-Language header.platonides15:56, 19 October 2011

Comments

#Comment by Platonides (talk | contribs)   23:49, 6 November 2011

I disagree. nl-be,nl;q=0.7,en-us,en;q=0.3 would be equivalent to nl-be;q=1,nl;q=0.7,en-us;q=1,en;q=0.3

The order is not relevant in the list. (I would personally have prefered to sort by order of appearance in the header for values with the same q-value, but php sort algorithm are not a stable sort)

Note that having 'en' with q-value 1 (as the comment suggest, in opposition to your summary) would indeed be a bug, but is not happening.


Or are you refering to the "prefix matching rule" (which is not implemented)? I think that rule would just allow serving an en-us document for a user which only provided on its accept-language en. So not relevant to this example-

#Comment by Krinkle (talk | contribs)   02:07, 7 November 2011
+			// @FIXME: In case "nl-be,nl;q=0.7,en-us,en;q=0.3", "en" gets defaulted to '1',

should be

			// @FIXME: In case "nl-be,nl;q=0.7,en-us,en;q=0.3", "en-us" gets defaulted to '1',

That was a typo

#Comment by Krinkle (talk | contribs)   02:13, 7 November 2011

Like I said in the commit message: " Although I'm not 100% sure about this, I've added a FIXME about the q-val defaulting to 1.".

When I add languages in Firefox preferences, the default for me was "nl, en-us, en". Which ended up like "nl;q=0.7,en-us,en;q=0.3" in the HTTP header. There for it appears that Firefox is not adding q-values for all languages codes, which to me clearly suggests that it assumes that whatever is processing that header is using either look-ahead or look-behind, NOT directly default to something.

So that "nl-be,nl;q=0.7, en-us,en;q=0.3" would be equivalent to "nl-be;q=0.7nl;q=0.7, en-us;q=0.3en;q=0.3"

If the specification says otherwise, I have no problem keeping it like it is. If the specification agrees however than the below comments are correct:

getAcceptableLanguages: ( 'nl-be,nl;q=0.7,en-us,en;q=0.3' ):
array(4) {
["nl-be"]=>
string(1) "1" // should be 0.7
["en-us"]=>
string(1) "1" // should be 0.3
["nl"]=>
string(3) "0.7"
["en"]=>
string(3) "0.3"
}
#Comment by Platonides (talk | contribs)   23:52, 7 November 2011

The relevant standard I followed is rfc 2616. The most relevant piece

(from section 14.4) would be:

Each language-range MAY be given an associated quality value 
which represents an estimate of the user's preference for the 
languages specified by that range. The quality value defaults to "q=1".

If you find the standard to imply otherwise, please share.


Now to the Firefox behavior: I couldn't reproduce it.

When I add nl, en-us and en in Firefox languages, I get

nl,en-us;q=0.7,en;q=0.3

which is consistent with what I expected.

Ordering, 'nl, en-us, en' it's

nl,en-us;q=0.7,en;q=0.3

Setting also nl-be:

nl-be,nl;q=0.8,en-us;q=0.5,en;q=0.3

How did you get "nl-be,nl;q=0.7,en-us,en;q=0.3" from Firefox interface?


I looked for a Firefox bug like that, but didn't find it.

Yet, there are some interesting ones like 55800 or

58034: Accept-Language header needs q values which

supports my view in the opening comment:

You can change the priority of the languages within the preferences, but no q
values are attached to each language. Thus, true content negotiators such as
mod_negotiation in Apache view all languages as being preferred equally, as
according to HTTP/1.1 (IE properly adds q values to the languages)

That bug follows to the patch adding q-values, with comment 7 explaining

firefox algorithm for q-value assignation:

1 is divided amongst the languages present. For
example, if you have three languages, the first gets a q of 1.000, the second
gets 0.667, the third gets 0.333.


I don't know how that unqvalued 'en-us' arrived to your Accept-Language header.

#Comment by Krinkle (talk | contribs)   21:44, 8 November 2011

Alrighty, that's exactly what I was looking for. Removed fixme in r102453.

Status & tagging log