r83970 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r83969‎ | r83970 | r83971 >
Date:22:14, 14 March 2011
Author:hashar
Status:resolved (Comments)
Tags:
Comment:
bug 28040 Turkish: properly lower case 'I' to 'ı' (dotless i)

Turkish has two different i, one with a dot and another without a dot. They
are totally different letters in this language, so we have to override the
ucfirst and lcfirst methods.
See http://en.wikipedia.org/wiki/Dotted_and_dotless_I

Credits to #wikipedia-tr users berm, []LuCkY[] and Emperyan
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/languages/classes/LanguageTr.php (modified) (history)
  • /trunk/phase3/tests/phpunit/languages/LanguageTrTest.php (added) (history)

Diff [purge]

Index: trunk/phase3/tests/phpunit/languages/LanguageTrTest.php
@@ -0,0 +1,67 @@
 2+<?php
 3+/**
 4+ * @author Ashar Voultoiz
 5+ * @copyright Copyright © 2011, Ashar Voultoiz
 6+ * @file
 7+ */
 8+
 9+require_once dirname(dirname(__FILE__)). '/bootstrap.php';
 10+
 11+/** Tests for MediaWiki languages/LanguageTr.php */
 12+class LanguageTrTest extends MediaWikiTestCase {
 13+ private $lang;
 14+
 15+ function setUp() {
 16+ $this->lang = Language::factory( 'Tr' );
 17+ }
 18+ function tearDown() {
 19+ unset( $this->lang );
 20+ }
 21+
 22+ /**
 23+ * See @bug 28040
 24+ * Credits to #wikipedia-tr users berm, []LuCkY[] and Emperyan
 25+ * @see http://en.wikipedia.org/wiki/Dotted_and_dotless_I
 26+ * @dataProvider provideDottedAndDotlessI
 27+ */
 28+ function testDottedAndDotlessI( $func, $input, $inputCase, $expected ) {
 29+ if( $func == 'ucfirst' ) {
 30+ $res = $this->lang->ucfirst( $input );
 31+ } elseif( $func == 'lcfirst' ) {
 32+ $res = $this->lang->lcfirst( $input );
 33+ } else {
 34+ throw new MWException( __METHOD__ . " given an invalid function name '$func'" );
 35+ }
 36+
 37+ $msg = "Converting $inputCase case '$input' with $func should give '$expected'";
 38+
 39+ $this->assertEquals( $expected, $res, $msg );
 40+ }
 41+
 42+ function provideDottedAndDotlessI() {
 43+ return array(
 44+ # function, input, input case, expected
 45+ # Case changed:
 46+ array( 'ucfirst', 'ı', 'lower', 'I' ),
 47+ array( 'ucfirst', 'i', 'lower', 'İ' ),
 48+ array( 'lcfirst', 'I', 'upper', 'ı' ),
 49+ array( 'lcfirst', 'İ', 'upper', 'i' ),
 50+
 51+ # Already using the correct case
 52+ array( 'ucfirst', 'I', 'upper', 'I' ),
 53+ array( 'ucfirst', 'İ', 'upper', 'İ' ),
 54+ array( 'lcfirst', 'ı', 'lower', 'ı' ),
 55+ array( 'lcfirst', 'i', 'lower', 'i' ),
 56+
 57+ # A real example taken from bug 28040 using
 58+ # http://tr.wikipedia.org/wiki/%C4%B0Phone
 59+ array( 'lcfirst', 'iPhone', 'lower', 'iPhone' ),
 60+
 61+ # next case is valid in Turkish but are different words if we
 62+ # consider IPhone is English!
 63+ array( 'lcfirst', 'IPhone', 'upper', 'ıPhone' ),
 64+
 65+ );
 66+ }
 67+
 68+}
Property changes on: trunk/phase3/tests/phpunit/languages/LanguageTrTest.php
___________________________________________________________________
Added: svn:eol-style
169 + native
Index: trunk/phase3/languages/classes/LanguageTr.php
@@ -3,9 +3,15 @@
44 /**
55 * Turkish (Türkçe)
66 *
 7+ * Turkish has two different i, one with a dot and another without a dot. They
 8+ * are totally different letters in this language, so we have to override the
 9+ * ucfirst and lcfirst methods.
 10+ * See http://en.wikipedia.org/wiki/Dotted_and_dotless_I
 11+ * and @bug 28040
712 * @ingroup Language
813 */
914 class LanguageTr extends Language {
 15+
1016 function ucfirst ( $string ) {
1117 if ( !empty( $string ) && $string[0] == 'i' ) {
1218 return 'İ' . substr( $string, 1 );
@@ -13,4 +19,13 @@
1420 return parent::ucfirst( $string );
1521 }
1622 }
 23+
 24+ function lcfirst ( $string ) {
 25+ if ( !empty( $string ) && $string[0] == 'I' ) {
 26+ return 'ı' . substr( $string, 1 );
 27+ } else {
 28+ return parent::lcfirst( $string );
 29+ }
 30+ }
 31+
1732 }
Index: trunk/phase3/RELEASE-NOTES
@@ -275,6 +275,7 @@
276276 * (bug 27681) Set $namespaceGenderAliases for Portuguese (pt and pt-br)
277277 * (bug 27785) Fallback language for Kabardian (kbd) is English now.
278278 * (bug 27825) Raw watchlist edit message now uses formatted numbers.
 279+* (bug 28040) Turkish: properly lower case 'I' to 'ı' (dotless i)
279280
280281 == Compatibility ==
281282

Follow-up revisions

RevisionCommit summaryAuthorDate
r84057bug 28040 Turkish: properly handle dotted and dotless i...hashar21:56, 15 March 2011

Comments

#Comment by Bawolff (talk | contribs)   23:09, 14 March 2011

I don't know anything about the turkish language, but wouldn't we want to override the lc and uc functions instead of the lcfirst and ucfirst? Given that these letters don't cease to become capitals/lowercase of each other if they're not the first letter in a word.

#Comment by Hashar (talk | contribs)   21:57, 15 March 2011

Taken care off with r84057. Bug 28040 amended.

Status & tagging log