r84057 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r84056‎ | r84057 | r84058 >
Date:21:56, 15 March 2011
Author:hashar
Status:reverted (Comments)
Tags:
Comment:
bug 28040 Turkish: properly handle dotted and dotless i

As mentioned by Bawolff on code review, r83970 only handled case change
of the first character lacking full strings support.

This patch override the uc and lc methods for the Turkish language (tr)
using preg_replace() which know about unicode. Other possible choices
would have been:
- strtr() => outputs garbage
- mbstring => can not know we handle turkish and transform i to I!

I have amended the RELEASE-NOTES to reflect this patch.

Some new tests are added as well to cover the regular functions as
well as the specific Turkish overriding. Result in testdox:

LanguageTr
[x] Change case of first char being dotted and dotless i
[x] Language tr lower casing override
[x] Language tr upper casing override
[x] Upper casing of a string with dotted and dot less i
[x] Lower casing of a string with dotted and dot less i
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/languages/classes/LanguageTr.php (modified) (history)
  • /trunk/phase3/tests/phpunit/languages/LanguageTrTest.php (modified) (history)

Diff [purge]

Index: trunk/phase3/tests/phpunit/languages/LanguageTrTest.php
@@ -24,7 +24,7 @@
2525 * @see http://en.wikipedia.org/wiki/Dotted_and_dotless_I
2626 * @dataProvider provideDottedAndDotlessI
2727 */
28 - function testDottedAndDotlessI( $func, $input, $inputCase, $expected ) {
 28+ function testChangeCaseOfFirstCharBeingDottedAndDotlessI( $func, $input, $inputCase, $expected ) {
2929 if( $func == 'ucfirst' ) {
3030 $res = $this->lang->ucfirst( $input );
3131 } elseif( $func == 'lcfirst' ) {
@@ -62,6 +62,60 @@
6363 array( 'lcfirst', 'IPhone', 'upper', 'ıPhone' ),
6464
6565 );
 66+ }
 67+
 68+##### LanguageTr specificities #############################################
 69+ /**
 70+ * @cover LanguageTr:lc
 71+ * See @bug 28040
 72+ */
 73+ function testLanguageTrLowerCasingOverride() {
 74+ $this->assertEquals( 'ııııı', $this->lang->lc( 'IIIII') );
6675 }
 76+ /**
 77+ * @cover LanguageTr:uc
 78+ * See @bug 28040
 79+ */
 80+ function testLanguageTrUpperCasingOverride() {
 81+ $this->assertEquals( 'İİİİİ', $this->lang->uc( 'iiiii') );
 82+ }
6783
 84+##### Upper casing a string #################################################
 85+ /**
 86+ * Generic test for the Turkish dotted and dotless I strings
 87+ * See @bug 28040
 88+ * @dataProvider provideUppercaseStringsWithDottedAndDotlessI
 89+ */
 90+ function testUpperCasingOfAStringWithDottedAndDotLessI( $expected, $input ) {
 91+ $this->assertEquals( $expected, $this->lang->uc( $input ) );
 92+ }
 93+ function provideUppercaseStringsWithDottedAndDotlessI() {
 94+ return array(
 95+ # expected, input string to uc()
 96+ array( 'IIIII', 'ııııı' ),
 97+ array( 'IIIII', 'IIIII' ), #identity
 98+ array( 'İİİİİ', 'iiiii' ), # Specifically handled by LanguageTr:uc
 99+ array( 'İİİİİ', 'İİİİİ' ), #identity
 100+ );
 101+ }
 102+
 103+##### Lower casing a string #################################################
 104+ /**
 105+ * Generic test for the Turkish dotted and dotless I strings
 106+ * See @bug 28040
 107+ * @dataProvider provideLowercaseStringsWithDottedAndDotlessI
 108+ */
 109+ function testLowerCasingOfAStringWithDottedAndDotLessI( $expected, $input ) {
 110+ $this->assertEquals( $expected, $this->lang->lc( $input ) );
 111+ }
 112+ function provideLowercaseStringsWithDottedAndDotlessI() {
 113+ return array(
 114+ # expected, input string to lc()
 115+ array( 'ııııı', 'IIIII' ), # Specifically handled by LanguageTr:lc
 116+ array( 'ııııı', 'ııııı' ), #identity
 117+ array( 'iiiii', 'İİİİİ' ),
 118+ array( 'iiiii', 'iiiii' ), #identity
 119+ );
 120+ }
 121+
68122 }
Index: trunk/phase3/languages/classes/LanguageTr.php
@@ -28,4 +28,16 @@
2929 }
3030 }
3131
 32+ /** @see bug 28040 */
 33+ function uc( $string ) {
 34+ $string = preg_replace( '/i/', 'İ', $string );
 35+ return parent::uc( $string );
 36+ }
 37+
 38+ /** @see bug 28040 */
 39+ function lc( $string ) {
 40+ $string = preg_replace( '/I/', 'ı', $string );
 41+ return parent::lc( $string );
 42+ }
 43+
3244 }
Index: trunk/phase3/RELEASE-NOTES
@@ -277,7 +277,8 @@
278278 * (bug 27681) Set $namespaceGenderAliases for Portuguese (pt and pt-br)
279279 * (bug 27785) Fallback language for Kabardian (kbd) is English now.
280280 * (bug 27825) Raw watchlist edit message now uses formatted numbers.
281 -* (bug 28040) Turkish: properly lower case 'I' to 'ı' (dotless i)
 281+* (bug 28040) Turkish: properly lower case 'I' to 'ı' (dotless i) and
 282+ uppercase 'i' to 'İ' (dotted i)
282283
283284 == Compatibility ==
284285

Follow-up revisions

RevisionCommit summaryAuthorDate
r84080Makes LanguageTr uc & lc match parent declaration...hashar07:38, 16 March 2011
r99074Fixes for r84057 LanguageTr uc/lc:...tstarling02:31, 6 October 2011
r99246Tests for bug 31490 : turkish magic word with a 'i' are broken :d...hashar20:18, 7 October 2011
r99290Revert r84057, r84080, part of r99074: lc() and uc() custom handling for Turk...brion00:30, 8 October 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r83970bug 28040 Turkish: properly lower case 'I' to 'ı' (dotless i)...hashar22:14, 14 March 2011

Comments

#Comment by Hashar (talk | contribs)   21:58, 15 March 2011

bug 28040 amended and still open.

#Comment by Raymond (talk | contribs)   07:37, 16 March 2011

Seen on Translatewiki:

PHP Strict Standards:  Declaration of LanguageTr::lc() should be compatible with that of Language::lc() in /www/w/languages/classes/LanguageTr.php on line 43
PHP Strict Standards:  Declaration of LanguageTr::uc() should be compatible with that of Language::uc() in /www/w/languages/Language.php on line 182
#Comment by Hashar (talk | contribs)   09:02, 16 March 2011

Should be taken care of with follow up r84080. One of my computers got an incorrect PHP configuration and those errors were not shown :(

#Comment by MaxSem (talk | contribs)   18:20, 7 October 2011

Causes bug 31490 - lcfirst and ucfirst parser functions do not work

#Comment by Brion VIBBER (talk | contribs)   18:29, 7 October 2011

Specifically it looks like it breaks case-insensitive matching of magic words that contain the letter 'i' or 'I'.

So {{ucfirst:x}} doesn't match with the 'ucfirst' keyword anymore, whereas 'UCFIRST' or 'ucfırst' do.

#Comment by MaxSem (talk | contribs)   19:24, 7 October 2011

We could normalize magic words by passing them through lc() and then uc() in LocalisationCache, but I don't know what else this revision could potentially break.

#Comment by Hashar (talk | contribs)   20:30, 7 October 2011

We should probably revert this revision on live wikis pending a proper fix.

#Comment by Szoszv (talk | contribs)   19:39, 7 October 2011

Status & tagging log