r86623 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r86622‎ | r86623 | r86624 >
Date:14:02, 21 April 2011
Author:demon
Status:reverted (Comments)
Tags:needs-php-test 
Comment:
(bug 28643) Merge Serbian language variant conversion improvements to trunk (r85224, r85239, r85308) from Nikola's branch
Modified paths:
  • /trunk/phase3/languages (modified) (history)
  • /trunk/phase3/languages/LanguageConverter.php (modified) (history)
  • /trunk/phase3/languages/classes/LanguageSr.php (modified) (history)

Diff [purge]

Index: trunk/phase3/languages/LanguageConverter.php
@@ -322,6 +322,10 @@
323323 }
324324 }
325325
 326+ if( $this->guessVariant( $text, $toVariant ) ) {
 327+ return $text;
 328+ }
 329+
326330 /* we convert everything except:
327331 1. HTML markups (anything between < and >)
328332 2. HTML entities
@@ -571,7 +575,7 @@
572576 */
573577 public function convertTo( $text, $variant ) {
574578 global $wgDisableLangConversion;
575 - if ( $wgDisableLangConversion ) {
 579+ if ( $wgDisableLangConversion || $this->guessVariant( $text, $variant ) ) {
576580 return $text;
577581 }
578582 return $this->recursiveConvertTopLevel( $text, $variant );
@@ -769,6 +773,20 @@
770774 }
771775
772776 /**
 777+ * Guess if a text is written in a variant. This should be implemented in subclasses.
 778+ *
 779+ * @param string $text the text to be checked
 780+ * @param string $variant language code of the variant to be checked for
 781+ * @return bool true if $text appears to be written in $variant, false if not
 782+ *
 783+ * @author Nikola Smolenski <smolensk@eunet.rs>
 784+ * @since 1.18
 785+ */
 786+ public function guessVariant($text, $variant) {
 787+ return false;
 788+ }
 789+
 790+ /**
773791 * Load default conversion tables.
774792 * This method must be implemented in derived class.
775793 *
Index: trunk/phase3/languages/classes/LanguageSr.php
@@ -148,6 +148,27 @@
149149
150150 return $ret;
151151 }
 152+
 153+ /**
 154+ * Guess if a text is written in Cyrillic or Latin.
 155+ *
 156+ * @author Nikola Smolenski <smolensk@eunet.rs>
 157+ * @since 1.18
 158+ */
 159+ public function guessVariant( $text, $variant ) {
 160+ $numCyrillic = preg_match_all("/[шђчћжШЂЧЋЖ]/u", $text, $dummy);
 161+ $numLatin = preg_match_all("/[šđč枊ĐČĆŽ]/u", $text, $dummy);
 162+
 163+ if( $variant == 'sr-ec' ) {
 164+ return $numCyrillic > $numLatin;
 165+ } else if( $variant == 'sr-el' ) {
 166+ return $numLatin > $numCyrillic;
 167+ } else {
 168+ return false;
 169+ }
 170+
 171+ }
 172+
152173 }
153174
154175 /**
@@ -202,4 +223,5 @@
203224 }
204225 }
205226 }
 227+
206228 }
Property changes on: trunk/phase3/languages
___________________________________________________________________
Added: svn:mergeinfo
207229 Merged /branches/new-installer/phase3/languages:r43664-66004
208230 Merged /branches/REL1_15/phase3/languages:r51646
209231 Merged /branches/REL1_17/phase3/languages:r81445,81448
210232 Merged /branches/nikola/phase3/languages:r85224,85239,85308
211233 Merged /branches/sqlite/languages:r58211-58321

Follow-up revisions

RevisionCommit summaryAuthorDate
r90604Follow up r86623. Remember to add the proper wfProfileOut when you add a new ...platonides21:07, 22 June 2011
r90951Revert r86623: merging Serbian language improvements to trunk. Per CR: this n...demon12:00, 28 June 2011
r101369(bug 28643) tests for Serbian script conversions...hashar17:30, 31 October 2011
r103327bug 28643 improvement to serbian variants conversion...hashar15:12, 16 November 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r85224Don't convert text to a variant, if it is already written in that variant (to...nikola02:25, 3 April 2011
r85239Minor fixesnikola10:27, 3 April 2011
r85308@since 1.17 → @since 1.18nikola03:07, 4 April 2011

Comments

#Comment by Brion VIBBER (talk | contribs)   00:42, 15 June 2011

This sounds like it will fail to convert mixed input strings; for instance if you give it a text that's mostly Latin and throws in a word of cyrillic, the cyrillic word won't get converted because the count of special Latin chars will outnumber the count of special cyrillic chars.

I strongly recommend adding some test cases to confirm that behavior remains as expected.

#Comment by Nikola Smolenski (talk | contribs)   14:50, 31 August 2011

Yes, and this is desired behavior. If you have a text that's mostly Latin with a word in Cyrillic, that word is most likely a foreign word that does not need to be converted to Latin.

I agree that tests would be nice though.

#Comment by Hashar (talk | contribs)   10:54, 24 October 2011

Removing 'needs-php-test' tags. I have added it to follow up r90951 which supersede this commit.

#Comment by Hashar (talk | contribs)   10:54, 24 October 2011

Well follow up was just a revert :/ I am adding back the keyword.

Status & tagging log