r64283 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r64282‎ | r64283 | r64284 >
Date:03:10, 28 March 2010
Author:conrad
Status:ok
Tags:
Comment:
Re-normalize titles after html entity decoding when necessary (bug 14952)
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/Sanitizer.php (modified) (history)
  • /trunk/phase3/includes/Title.php (modified) (history)
  • /trunk/phase3/maintenance/parserTests.txt (modified) (history)

Diff [purge]

Index: trunk/phase3/RELEASE-NOTES
@@ -62,6 +62,8 @@
6363 themselves unless they are given the 'unblockself' permission.
6464 * (bug 22876) Avoid possible PHP Notice if $wgDefaultUserOptions is not
6565 correctly set
 66+* (bug 14952) Page titles are renormalized after html entities are removed so that
 67+ links with non-NFC character references work correctly.
6668
6769 == API changes in 1.17 ==
6870 * (bug 22738) Allow filtering by action type on query=logevent
Index: trunk/phase3/maintenance/parserTests.txt
@@ -4114,7 +4114,32 @@
41154115 </p>
41164116 !!end
41174117
 4118+!! article
 4119+אַ
 4120+!! text
 4121+Test for unicode normalization
 4122+
 4123+The page's name is U+05d0 U+05b7, with non-canonical form U+FB2E
 4124+!! endarticle
 4125+
41184126 !! test
 4127+(bug 19451) Links should refer to the normalized form.
 4128+!! input
 4129+[[&#xFB2E;]]
 4130+[[&#x5d0;&#x5b7;]]
 4131+[[&#x5d0;ַ]]
 4132+[[א&#x5b7;]]
 4133+[[אַ]]
 4134+!! result
 4135+<p><a href="https://www.mediawiki.org/wiki/%D7%90%D6%B7" title="אַ">&#xfb2e;</a>
 4136+<a href="https://www.mediawiki.org/wiki/%D7%90%D6%B7" title="אַ">&#x5d0;&#x5b7;</a>
 4137+<a href="https://www.mediawiki.org/wiki/%D7%90%D6%B7" title="אַ">&#x5d0;ַ</a>
 4138+<a href="https://www.mediawiki.org/wiki/%D7%90%D6%B7" title="אַ">א&#x5b7;</a>
 4139+<a href="https://www.mediawiki.org/wiki/%D7%90%D6%B7" title="אַ">אַ</a>
 4140+</p>
 4141+!! end
 4142+
 4143+!! test
41194144 Empty attribute crash test (bug 2067)
41204145 !! input
41214146 <font color="">foo</font>
Index: trunk/phase3/includes/Title.php
@@ -127,9 +127,9 @@
128128 }
129129
130130 /**
131 - * Convert things like &eacute; &#257; or &#x3017; into real text...
 131+ * Convert things like &eacute; &#257; or &#x3017; into normalized(bug 14952) text
132132 */
133 - $filteredText = Sanitizer::decodeCharReferences( $text );
 133+ $filteredText = Sanitizer::decodeCharReferencesAndNormalize( $text );
134134
135135 $t = new Title();
136136 $t->mDbkeyform = str_replace( ' ', '_', $filteredText );
Index: trunk/phase3/includes/Sanitizer.php
@@ -1177,6 +1177,30 @@
11781178 }
11791179
11801180 /**
 1181+ * Decode any character references, numeric or named entities,
 1182+ * in the next and normalize the resulting string. (bug 14952)
 1183+ *
 1184+ * This is useful for page titles, not for text to be displayed,
 1185+ * MediaWiki allows HTML entities to escape normalization as a feature.
 1186+ *
 1187+ * @param $text String (already normalized, containing entities)
 1188+ * @return String (still normalized, without entities)
 1189+ */
 1190+ public static function decodeCharReferencesAndNormalize( $text ) {
 1191+ global $wgContLang;
 1192+ $text = preg_replace_callback(
 1193+ MW_CHAR_REFS_REGEX,
 1194+ array( 'Sanitizer', 'decodeCharReferencesCallback' ),
 1195+ $text, /* limit */ -1, $count );
 1196+
 1197+ if ( $count ) {
 1198+ return $wgContLang->normalize( $text );
 1199+ } else {
 1200+ return $text;
 1201+ }
 1202+ }
 1203+
 1204+ /**
11811205 * @param $matches String
11821206 * @return String
11831207 */

Status & tagging log