r69977 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r69976‎ | r69977 | r69978 >
Date:22:04, 26 July 2010
Author:simetrical
Status:resolved (Comments)
Tags:
Comment:
Fix bug in prefixing scheme

As Bawolff pointed out at [[mw:User talk:Simetrical/Collation]], the
prefixing scheme I was using meant that the page "Z" with sort key of
"F" would sort after a page named "A" with a sort key of "FF", since the
first one's raw sort key would compute to "FZ", and the second's would
compute to "FFA". I've fixed this by separating the prefix from the
unprefixed part by a null byte (cl_sortkey is eventually going to be
totally binary anyway, may as well start now).
Modified paths:
  • /trunk/phase3/includes/CategoryPage.php (modified) (history)
  • /trunk/phase3/includes/LinksUpdate.php (modified) (history)
  • /trunk/phase3/includes/Title.php (modified) (history)
  • /trunk/phase3/languages/Language.php (modified) (history)
  • /trunk/phase3/maintenance/updateCollation.php (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/updateCollation.php
@@ -57,11 +57,10 @@
5858 $dbw->begin();
5959 foreach ( $res as $row ) {
6060 $title = Title::newFromRow( $row );
61 - $rawSortkey = $title->getCategorySortkey();
6261 if ( $row->cl_collation == 0 ) {
6362 # This is an old-style row, so the sortkey needs to be
6463 # converted.
65 - if ( $row->cl_sortkey == $rawSortkey ) {
 64+ if ( $row->cl_sortkey == $title->getCategorySortkey() ) {
6665 $prefix = '';
6766 } else {
6867 # Custom sortkey, use it as a prefix
@@ -82,7 +81,8 @@
8382 $dbw->update(
8483 'categorylinks',
8584 array(
86 - 'cl_sortkey' => $wgContLang->convertToSortkey( $prefix . $rawSortkey ),
 85+ 'cl_sortkey' => $wgContLang->convertToSortkey(
 86+ $title->getCategorySortkey( $prefix ) ),
8787 'cl_sortkey_prefix' => $prefix,
8888 'cl_collation' => $wgCollationVersion,
8989 'cl_type' => $type,
Index: trunk/phase3/includes/CategoryPage.php
@@ -312,7 +312,7 @@
313313 $count = 0;
314314 foreach ( $res as $row ) {
315315 $title = Title::newFromRow( $row );
316 - $rawSortkey = $row->cl_sortkey_prefix . $title->getCategorySortkey();
 316+ $rawSortkey = $title->getCategorySortkey( $row->cl_sortkey_prefix );
317317
318318 if ( ++$count > $this->limit ) {
319319 # We've reached the one extra which shows that there
Index: trunk/phase3/includes/LinksUpdate.php
@@ -457,7 +457,7 @@
458458 # order or such.
459459 $prefix = $sortkey;
460460 $sortkey = $wgContLang->convertToSortkey(
461 - $prefix . $this->mTitle->getCategorySortkey() );
 461+ $this->mTitle->getCategorySortkey( $prefix ) );
462462 }
463463
464464 $arr[] = array(
Index: trunk/phase3/includes/Title.php
@@ -4139,20 +4139,29 @@
41404140 }
41414141
41424142 /**
4143 - * Returns what the default sort key for categories would be, if
4144 - * {{defaultsort:}} isn't used. This is the same as getText() for
4145 - * categories, and for everything if $wgCategoryPrefixedDefaultSortkey is
4146 - * false; otherwise it's the same as getPrefixedText().
 4143+ * Returns the raw sort key to be used for categories, with the specified
 4144+ * prefix. This will be fed to Language::convertToSortkey() to get a
 4145+ * binary sortkey that can be used for actual sorting.
41474146 *
 4147+ * @param $prefix string The prefix to be used, specified using
 4148+ * {{defaultsort:}} or like [[Category:Foo|prefix]]. Empty for no
 4149+ * prefix.
41484150 * @return string
41494151 */
4150 - public function getCategorySortkey() {
 4152+ public function getCategorySortkey( $prefix = '' ) {
41514153 global $wgCategoryPrefixedDefaultSortkey;
41524154 if ( $this->getNamespace() == NS_CATEGORY
41534155 || !$wgCategoryPrefixedDefaultSortkey ) {
4154 - return $this->getText();
 4156+ $unprefixed = $this->getText();
41554157 } else {
4156 - return $this->getPrefixedText();
 4158+ $unprefixed = $this->getPrefixedText();
41574159 }
 4160+ if ( $prefix !== '' ) {
 4161+ # Separate with a null byte, so the unprefixed part is only used as
 4162+ # a tiebreaker when two pages have the exact same prefix -- null
 4163+ # sorts before everything else (hopefully).
 4164+ return "$prefix\0$unprefixed";
 4165+ }
 4166+ return $unprefixed;
41584167 }
41594168 }
Index: trunk/phase3/languages/Language.php
@@ -2938,10 +2938,10 @@
29392939 /**
29402940 * Given a string, convert it to a (hopefully short) key that can be used
29412941 * for efficient sorting. A binary sort according to the sortkeys
2942 - * corresponds to a logical sort of the corresponding strings. Applying
2943 - * this to cl_sortkey_prefix concatenated with the page title (possibly
2944 - * with namespace prefix, depending on $wgCategoryPrefixedDefaultSortkey)
2945 - * gives you cl_sortkey.
 2942+ * corresponds to a logical sort of the corresponding strings. Current
 2943+ * code expects that a null character should sort before all others, but
 2944+ * has no other particular expectations (and that one can be changed if
 2945+ * necessary).
29462946 *
29472947 * @param string $string UTF-8 string
29482948 * @return string Binary sortkey
@@ -2988,6 +2988,9 @@
29892989 * @return string UTF-8 string corresponding to the first letter of input
29902990 */
29912991 public function firstLetterForLists( $string ) {
 2992+ if ( $string[0] == "\0" ) {
 2993+ $string = substr( $string, 1 );
 2994+ }
29922995 return strtoupper( mb_substr( $string, 0, 1 ) );
29932996 }
29942997 }

Comments

#Comment by Tim Starling (talk | contribs)   05:59, 20 January 2011

Null is ignorable in UCA, which would have caused a lot of problems, so I changed it to LF in r80443.

Is it really necessary to strip leading nulls in firstLetterForLists()? I removed that feature in r80443. It seems that Title::getCategorySortkey() is removing the leading nulls (or LFs) itself.

#Comment by Simetrical (talk | contribs)   11:59, 21 January 2011

IIRC, stripping the leading null was actually necessary when I added it, and stuff did sort wrong before I did it, although I can't swear to it. It might be that it was necessary at some point while I was writing the code, but then by the time I finished and did the commit it was no longer needed. Possibly I didn't special-case empty prefixes originally in getCategorySortkey(). There's certainly no profound reason for it – if it looks like it's unnecessary, it probably is.

Status & tagging log