r83970 MediaWiki - Code Review archive

Repository:	MediaWiki
Revision:	< r83969‎ \| r83970 \| r83971 >
Date:	22:14, 14 March 2011
Author:	hashar
Status:	resolved (Comments)
Tags:
Comment:	bug 28040 Turkish: properly lower case 'I' to 'ı' (dotless i) Turkish has two different i, one with a dot and another without a dot. They are totally different letters in this language, so we have to override the ucfirst and lcfirst methods. See http://en.wikipedia.org/wiki/Dotted_and_dotless_I Credits to #wikipedia-tr users berm, []LuCkY[] and Emperyan
Modified paths:	/trunk/phase3/RELEASE-NOTES (modified) (history) /trunk/phase3/languages/classes/LanguageTr.php (modified) (history) /trunk/phase3/tests/phpunit/languages/LanguageTrTest.php (added) (history)

Diff [purge]

Index: trunk/phase3/tests/phpunit/languages/LanguageTrTest.php
—	—	@@ -0,0 +1,67 @@
	2	+<?php
	3	+/**
	4	+ * @author Ashar Voultoiz
	5	+ * @copyright Copyright © 2011, Ashar Voultoiz
	6	+ * @file
	7	+ */
	8	+
	9	+require_once dirname(dirname(__FILE__)). '/bootstrap.php';
	10	+
	11	+/** Tests for MediaWiki languages/LanguageTr.php */
	12	+class LanguageTrTest extends MediaWikiTestCase {
	13	+ private $lang;
	14	+
	15	+ function setUp() {
	16	+ $this->lang = Language::factory( 'Tr' );
	17	+ }
	18	+ function tearDown() {
	19	+ unset( $this->lang );
	20	+ }
	21	+
	22	+ /**
	23	+ * See @bug 28040
	24	+ * Credits to #wikipedia-tr users berm, []LuCkY[] and Emperyan
	25	+ * @see http://en.wikipedia.org/wiki/Dotted_and_dotless_I
	26	+ * @dataProvider provideDottedAndDotlessI
	27	+ */
	28	+ function testDottedAndDotlessI( $func, $input, $inputCase, $expected ) {
	29	+ if( $func == 'ucfirst' ) {
	30	+ $res = $this->lang->ucfirst( $input );
	31	+ } elseif( $func == 'lcfirst' ) {
	32	+ $res = $this->lang->lcfirst( $input );
	33	+ } else {
	34	+ throw new MWException( __METHOD__ . " given an invalid function name '$func'" );
	35	+ }
	36	+
	37	+ $msg = "Converting $inputCase case '$input' with $func should give '$expected'";
	38	+
	39	+ $this->assertEquals( $expected, $res, $msg );
	40	+ }
	41	+
	42	+ function provideDottedAndDotlessI() {
	43	+ return array(
	44	+ # function, input, input case, expected
	45	+ # Case changed:
	46	+ array( 'ucfirst', 'ı', 'lower', 'I' ),
	47	+ array( 'ucfirst', 'i', 'lower', 'İ' ),
	48	+ array( 'lcfirst', 'I', 'upper', 'ı' ),
	49	+ array( 'lcfirst', 'İ', 'upper', 'i' ),
	50	+
	51	+ # Already using the correct case
	52	+ array( 'ucfirst', 'I', 'upper', 'I' ),
	53	+ array( 'ucfirst', 'İ', 'upper', 'İ' ),
	54	+ array( 'lcfirst', 'ı', 'lower', 'ı' ),
	55	+ array( 'lcfirst', 'i', 'lower', 'i' ),
	56	+
	57	+ # A real example taken from bug 28040 using
	58	+ # http://tr.wikipedia.org/wiki/%C4%B0Phone
	59	+ array( 'lcfirst', 'iPhone', 'lower', 'iPhone' ),
	60	+
	61	+ # next case is valid in Turkish but are different words if we
	62	+ # consider IPhone is English!
	63	+ array( 'lcfirst', 'IPhone', 'upper', 'ıPhone' ),
	64	+
	65	+ );
	66	+ }
	67	+
	68	+}
Property changes on: trunk/phase3/tests/phpunit/languages/LanguageTrTest.php
___________________________________________________________________
Added: svn:eol-style
1	69	+ native
Index: trunk/phase3/languages/classes/LanguageTr.php
—	—	@@ -3,9 +3,15 @@
4	4	/**
5	5	* Turkish (Türkçe)
6	6	*
	7	+ * Turkish has two different i, one with a dot and another without a dot. They
	8	+ * are totally different letters in this language, so we have to override the
	9	+ * ucfirst and lcfirst methods.
	10	+ * See http://en.wikipedia.org/wiki/Dotted_and_dotless_I
	11	+ * and @bug 28040
7	12	* @ingroup Language
8	13	*/
9	14	class LanguageTr extends Language {
	15	+
10	16	function ucfirst ( $string ) {
11	17	if ( !empty( $string ) && $string[0] == 'i' ) {
12	18	return 'İ' . substr( $string, 1 );
—	—	@@ -13,4 +19,13 @@
14	20	return parent::ucfirst( $string );
15	21	}
16	22	}
	23	+
	24	+ function lcfirst ( $string ) {
	25	+ if ( !empty( $string ) && $string[0] == 'I' ) {
	26	+ return 'ı' . substr( $string, 1 );
	27	+ } else {
	28	+ return parent::lcfirst( $string );
	29	+ }
	30	+ }
	31	+
17	32	}
Index: trunk/phase3/RELEASE-NOTES
—	—	@@ -275,6 +275,7 @@
276	276	* (bug 27681) Set $namespaceGenderAliases for Portuguese (pt and pt-br)
277	277	* (bug 27785) Fallback language for Kabardian (kbd) is English now.
278	278	* (bug 27825) Raw watchlist edit message now uses formatted numbers.
	279	+* (bug 28040) Turkish: properly lower case 'I' to 'ı' (dotless i)
279	280
280	281	== Compatibility ==
281	282

Follow-up revisions

Revision	Commit summary	Author	Date
r84057	bug 28040 Turkish: properly handle dotted and dotless i...	hashar	21:56, 15 March 2011

Comments

#Comment by Bawolff (talk | contribs) 23:09, 14 March 2011

I don't know anything about the turkish language, but wouldn't we want to override the lc and uc functions instead of the lcfirst and ucfirst? Given that these letters don't cease to become capitals/lowercase of each other if they're not the first letter in a word.

#Comment by Hashar (talk | contribs) 21:57, 15 March 2011

Taken care off with r84057. Bug 28040 amended.

Status & tagging log

10:57, 27 July 2011 Hashar (talk | contribs) changed the tags for r83970 [removed: patchset-turkish]
21:10, 30 June 2011 Aaron Schulz (talk | contribs) changed the status of r83970 [removed: new added: resolved]
10:07, 17 March 2011 Hashar (talk | contribs) changed the tags for r83970 [added: patchset-turkish]