r81597 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r81596‎ | r81597 | r81598 >
Date:14:47, 6 February 2011
Author:hashar
Status:ok (Comments)
Tags:
Comment:
bugfix for wfBCP47 and code coverage

Language code are case insensitive. The BCP 47 recommands nice
formatting nonetheless. This patch enhance our formatting:
- tags preceded by the private tag 'x' are now lower case
- 4 letters tags are now lower case with first letter uper cased

Please note the RFC seems to have a bug for az-Arab-x-AZE-derbend
which should be az-Arab-x-aze-derbend . I have changed our test
to reflect this and added a comment for later reference.
Modified paths:
  • /trunk/phase3/includes/GlobalFunctions.php (modified) (history)
  • /trunk/phase3/tests/phpunit/includes/GlobalTest.php (modified) (history)

Diff [purge]

Index: trunk/phase3/tests/phpunit/includes/GlobalTest.php
@@ -632,6 +632,133 @@
633633
634634 }
635635
 636+ /**
 637+ * test @see wfBCP47().
 638+ * Please note the BCP explicitly state that language codes are case
 639+ * insensitive, there are some exceptions to the rule :)
 640+ * This test is used to verify our formatting against all lower and
 641+ * all upper cases language code.
 642+ *
 643+ * @see http://tools.ietf.org/html/bcp47
 644+ * @dataProvider provideLanguageCodes()
 645+ */
 646+ function testBCP47( $code, $expected ) {
 647+ $code = strtolower( $code );
 648+ $this->assertEquals( $expected, wfBCP47($code),
 649+ "Applying BCP47 standard to lower case '$code'"
 650+ );
 651+
 652+ $code = strtoupper( $code );
 653+ $this->assertEquals( $expected, wfBCP47($code),
 654+ "Applying BCP47 standard to upper case '$code'"
 655+ );
 656+ }
 657+
 658+ /**
 659+ * Array format is ($code, $expected)
 660+ */
 661+ function provideLanguageCodes() {
 662+ return array(
 663+ // Extracted from BCP47 (list not exhaustive)
 664+ # 2.1.1
 665+ array( 'en-ca-x-ca' , 'en-CA-x-ca' ),
 666+ array( 'sgn-be-fr' , 'sgn-BE-FR' ),
 667+ array( 'az-latn-x-latn', 'az-Latn-x-latn' ),
 668+ # 2.2
 669+ array( 'sr-Latn-RS', 'sr-Latn-RS' ),
 670+ array( 'az-arab-ir', 'az-Arab-IR' ),
 671+
 672+ # 2.2.5
 673+ array( 'sl-nedis' , 'sl-nedis' ),
 674+ array( 'de-ch-1996', 'de-CH-1996' ),
 675+
 676+ # 2.2.6
 677+ array(
 678+ 'en-latn-gb-boont-r-extended-sequence-x-private',
 679+ 'en-Latn-GB-boont-r-extended-sequence-x-private'
 680+ ),
 681+
 682+ // Examples from BCP47 Appendix A
 683+ # Simple language subtag:
 684+ array( 'DE', 'de' ),
 685+ array( 'fR', 'fr' ),
 686+ array( 'ja', 'ja' ),
 687+
 688+ # Language subtag plus script subtag:
 689+ array( 'zh-hans', 'zh-Hans'),
 690+ array( 'sr-cyrl', 'sr-Cyrl'),
 691+ array( 'sr-latn', 'sr-Latn'),
 692+
 693+ # Extended language subtags and their primary language subtag
 694+ # counterparts:
 695+ array( 'zh-cmn-hans-cn', 'zh-cmn-Hans-CN' ),
 696+ array( 'cmn-hans-cn' , 'cmn-Hans-CN' ),
 697+ array( 'zh-yue-hk' , 'zh-yue-HK' ),
 698+ array( 'yue-hk' , 'yue-HK' ),
 699+
 700+ # Language-Script-Region:
 701+ array( 'zh-hans-cn', 'zh-Hans-CN' ),
 702+ array( 'sr-latn-RS', 'sr-Latn-RS' ),
 703+
 704+ # Language-Variant:
 705+ array( 'sl-rozaj' , 'sl-rozaj' ),
 706+ array( 'sl-rozaj-biske', 'sl-rozaj-biske' ),
 707+ array( 'sl-nedis' , 'sl-nedis' ),
 708+
 709+ # Language-Region-Variant:
 710+ array( 'de-ch-1901' , 'de-CH-1901' ),
 711+ array( 'sl-it-nedis' , 'sl-IT-nedis' ),
 712+
 713+ # Language-Script-Region-Variant:
 714+ array( 'hy-latn-it-arevela', 'hy-Latn-IT-arevela' ),
 715+
 716+ # Language-Region:
 717+ array( 'de-de' , 'de-DE' ),
 718+ array( 'en-us' , 'en-US' ),
 719+ array( 'es-419', 'es-419'),
 720+
 721+ # Private use subtags:
 722+ array( 'de-ch-x-phonebk' , 'de-CH-x-phonebk' ),
 723+ array( 'az-arab-x-aze-derbend', 'az-Arab-x-aze-derbend' ),
 724+ /**
 725+ * Previous test does not reflect the BCP which states:
 726+ * az-Arab-x-AZE-derbend
 727+ * AZE being private, it should be lower case, hence the test above
 728+ * should probably be:
 729+ #array( 'az-arab-x-aze-derbend', 'az-Arab-x-AZE-derbend' ),
 730+ */
 731+
 732+ # Private use registry values:
 733+ array( 'x-whatever', 'x-whatever' ),
 734+ array( 'qaa-qaaa-qm-x-southern', 'qaa-Qaaa-QM-x-southern' ),
 735+ array( 'de-qaaa' , 'de-Qaaa' ),
 736+ array( 'sr-latn-qm', 'sr-Latn-QM' ),
 737+ array( 'sr-qaaa-rs', 'sr-Qaaa-RS' ),
 738+
 739+ # Tags that use extensions
 740+ array( 'en-us-u-islamcal', 'en-US-u-islamcal' ),
 741+ array( 'zh-cn-a-myext-x-private', 'zh-CN-a-myext-x-private' ),
 742+ array( 'en-a-myext-b-another', 'en-a-myext-b-another' ),
 743+
 744+ # Invalid:
 745+ // de-419-DE
 746+ // a-DE
 747+ // ar-a-aaa-b-bbb-a-ccc
 748+
 749+ /*
 750+ // ISO 15924 :
 751+ array( 'sr-Cyrl', 'sr-Cyrl' ),
 752+ array( 'SR-lATN', 'sr-Latn' ), # FIXME fix our function?
 753+ array( 'fr-latn', 'fr-Latn' ),
 754+ // Use lowercase for single segment
 755+ // ISO 3166-1-alpha-2 code
 756+ array( 'US', 'us' ), # USA
 757+ array( 'uS', 'us' ), # USA
 758+ array( 'Fr', 'fr' ), # France
 759+ array( 'va', 'va' ), # Holy See (Vatican City State)
 760+ */);
 761+ }
 762+
636763 /* TODO: many more! */
637764 }
638765
Index: trunk/phase3/includes/GlobalFunctions.php
@@ -3397,6 +3397,7 @@
33983398
33993399 /**
34003400 * Get the normalised IETF language tag
 3401+ * See unit test for examples.
34013402 * @param $code String: The language code.
34023403 * @return $langCode String: The language code which complying with BCP 47 standards.
34033404 */
@@ -3404,12 +3405,15 @@
34053406 $codeSegment = explode( '-', $code );
34063407 foreach ( $codeSegment as $segNo => $seg ) {
34073408 if ( count( $codeSegment ) > 0 ) {
 3409+ // when previous segment is x, it is a private segment and should be lc
 3410+ if( $segNo > 0 && strtolower( $codeSegment[($segNo - 1)] ) == 'x') {
 3411+ $codeBCP[$segNo] = strtolower( $seg );
34083412 // ISO 3166 country code
3409 - if ( ( strlen( $seg ) == 2 ) && ( $segNo > 0 ) ) {
 3413+ } elseif ( ( strlen( $seg ) == 2 ) && ( $segNo > 0 ) ) {
34103414 $codeBCP[$segNo] = strtoupper( $seg );
34113415 // ISO 15924 script code
34123416 } elseif ( ( strlen( $seg ) == 4 ) && ( $segNo > 0 ) ) {
3413 - $codeBCP[$segNo] = ucfirst( $seg );
 3417+ $codeBCP[$segNo] = ucfirst( strtolower( $seg ) );
34143418 // Use lowercase for other cases
34153419 } else {
34163420 $codeBCP[$segNo] = strtolower( $seg );

Follow-up revisions

RevisionCommit summaryAuthorDate
r97445Minor whitespace fixes in GlobalFunctions.php...krinkle23:21, 18 September 2011

Comments

#Comment by Hashar (talk | contribs)   14:51, 6 February 2011

$ php phpunit.php --configuration suite.xml --filter BCP

PHPUnit 3.5.10 by Sebastian Bergmann.
.......................................
Time: 1 second, Memory: 19.75Mb
OK (39 tests, 78 assertions)

Status & tagging log