r83812 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r83811‎ | r83812 | r83813 >
Date:10:35, 13 March 2011
Author:catrope
Status:reverted (Comments)
Tags:
Comment:
Schema change: change cl_type from ENUM('page', 'subcat', 'file') to varchar(6). This is needed because MySQL sorts 'page' < 'subcat' < 'file' when using ORDER BY cl_type but uses 'file' < 'page' < 'subcat' for the purposes of WHERE clauses, making paging impossible. Changing the ENUM() order to be alphabetical would fix the order discrepancy, but leave range scans like WHERE cl_type > 'page' unindexed. Varchars do behave correctly. Changing to an int was not an option because existing data would have to be migrated.

This commit does not include a patch for SQLite, because ALTER TABLE MODIFY is apparently not supported by SQLite as far as I could tell by Googling. Leaving resolution of this issue for SQLite to the SQLite experts; maybe SQLite's enum implementation is saner than MySQL and it doesn't even need this schema change, I don't know.
Modified paths:
  • /trunk/phase3/includes/installer/MysqlUpdater.php (modified) (history)
  • /trunk/phase3/maintenance/archives/patch-cl_type.sql (added) (history)
  • /trunk/phase3/maintenance/tables.sql (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/archives/patch-cl_type.sql
@@ -0,0 +1,6 @@
 2+--
 3+-- Change cl_type to a varchar from an enum because of the weird semantics of
 4+-- the < and > operators when working with enums
 5+--
 6+
 7+ALTER TABLE categorylinks MODIFY cl_type varchar(6) NOT NULL default 'page';
Property changes on: trunk/phase3/maintenance/archives/patch-cl_type.sql
___________________________________________________________________
Added: svn:eol-style
18 + native
Index: trunk/phase3/maintenance/tables.sql
@@ -521,7 +521,9 @@
522522 -- paginate the three categories separately. This never has to be updated
523523 -- after the page is created, since none of these page types can be moved to
524524 -- any other.
525 - cl_type ENUM('page', 'subcat', 'file') NOT NULL default 'page'
 525+ -- This used to be ENUM('page', 'subcat', 'file') but was changed to a
 526+ -- varchar because of the weird semantics of < and > when used on enums
 527+ cl_type varchar(6) NOT NULL default 'page'
526528 ) /*$wgDBTableOptions*/;
527529
528530 CREATE UNIQUE INDEX /*i*/cl_from ON /*_*/categorylinks (cl_from,cl_to);
Index: trunk/phase3/includes/installer/MysqlUpdater.php
@@ -175,6 +175,7 @@
176176 array( 'dropIndex', 'archive', 'ar_page_revid', 'patch-archive_kill_ar_page_revid.sql' ),
177177 array( 'addIndex', 'archive', 'ar_revid', 'patch-archive_ar_revid.sql' ),
178178 array( 'doLangLinksLengthUpdate' ),
 179+ array( 'doClTypeVarcharUpdate' ),
179180 );
180181 }
181182
@@ -828,4 +829,18 @@
829830 $this->output( "...ll_lang is up-to-date.\n" );
830831 }
831832 }
 833+
 834+ protected function doClTypeVarcharUpdate() {
 835+ $categorylinks = $this->db->tableName( 'categorylinks' );
 836+ $res = $this->db->query( "SHOW COLUMNS FROM $categorylinks LIKE 'cl_type'" );
 837+ $row = $this->db->fetchObject( $res );
 838+
 839+ if ( $row && substr( $row->Type, 0, 4 ) == 'enum' ) {
 840+ $this->output( 'Changing cl_type from enum to varchar...' );
 841+ $this->applyPatch( 'patch-cl_type.sql' );
 842+ $this->output( "done.\n" );
 843+ } else {
 844+ $this->output( "...cl_type is up-to-date.\n" );
 845+ }
 846+ }
832847 }

Follow-up revisions

RevisionCommit summaryAuthorDate
r83814(bug 27965) Paging in list=categorymembers was completely broken. It was pagi...catrope10:39, 13 March 2011
r83821Followup r83812, add /*_*/ to allow for prefixed tablesreedy13:10, 13 March 2011
r83993r83812, r83814: Don't use cl_type at all when paging categorylinks...saper02:53, 15 March 2011
r84392Per r83812 CR, solve the categorymembers paging problem by doing separate que...catrope16:25, 20 March 2011
r84394Revert r83812 (schema change for cl_type enum), no longer needed after r84392...catrope16:30, 20 March 2011

Comments

#Comment by Reedy (talk | contribs)   12:30, 13 March 2011

Marking fixme, patch doesn't allow table prefixed ;-)

#Comment by Bryan (talk | contribs)   16:08, 13 March 2011

Sqlite does not support enums. DatabaseSqlite::replaceVars() replaces ENUM types with TEXT types, so it should not need a patch.

#Comment by Saper (talk | contribs)   01:15, 14 March 2011

maintenance/update.php just failed on my empty trunk test wiki:

...ar_page_revid key doesn't exist. ...ar_revid key already set on archive table. ...ll_lang is up-to-date. Changing cl_type from enum to varchar...A database query syntax error has occurred. The last attempted database query was: "ALTER TABLE `categorylinks` MODIFY cl_type varchar(6) NOT NULL default 'page' " from within function "DatabaseBase::sourceFile( /home/wiki/root/rcw/maintenance/archives/patch-cl_type.sql )". Database returned error "1071: Specified key was too long; max key length is 1000 bytes (localhost)"

#Comment by Bryan (talk | contribs)   08:42, 14 March 2011

Are you running with $wgMysql5 = true?

#Comment by Saper (talk | contribs)   12:25, 14 March 2011

Yes, of course I do.

#Comment by Saper (talk | contribs)   03:32, 14 March 2011

It's possible to

ALTER TABLE /*_*/categorylinks MODIFY cl_type INT NOT NULL default 1;

Running this on plwiki_p categorylinks table (2237191 rows) takes 3 minutes on the toolserver:

Job 244779 (altercltypes) Complete
 User             = saper
 Queue            = all.q@yarrow.toolserver.org
 Host             = yarrow.toolserver.org
 Start Time       = 03/14/2011 03:16:56
 End Time         = 03/14/2011 03:19:22
 User Time        = 00:00:00
 System Time      = 00:00:00
 Wallclock Time   = 00:02:26
 CPU              = 00:00:00
 Max vmem         = 10.840M
 Exit Status      = 0

You actually get values 1 for page, 2 frr subcat, 3 for file.

Before conversion (a copy of current plwiki_p on the toolserver):

mysql> select cl_type,count(cl_type) from plwiki_p.categorylinks group by cl_type;
+---------+----------------+
| cl_type | count(cl_type) |
+---------+----------------+
| page    |        2099095 |
| subcat  |         137672 |
| file    |            424 |
+---------+----------------+
3 rows in set (1 min 4.96 sec)

After conversion:

mysql> select cl_type,count(cl_type) from categorylinks group by cl_type;
+---------+----------------+
| cl_type | count(cl_type) |
+---------+----------------+
|       1 |        2099095 |
|       2 |         137672 |
|       3 |            424 |
+---------+----------------+
3 rows in set (1.10 sec)

Adding another table (clt_desc?) where cl_type a foreign key to the clt_id would be beneficial, so that code still uses ('page', 'subcat', 'file')

#Comment by Catrope (talk | contribs)   08:53, 14 March 2011

Does varbinary(6) work?

#Comment by Tim Starling (talk | contribs)   04:10, 18 March 2011

Would it be possible to solve the API paging problem with the existing schema, by splitting the query up into several queries, one for each type? ApiQueryGeneratorBase::select() is short and simple, it could be overridden in ApiQueryCategoryMembers or just not called. Instead of

				$this->addWhere( "cl_type $op $escType OR " .
					"(cl_type = $escType AND " .

you would leave the cl_type out of the condition initially, and have something like:

$allTypes = array( 'page', 'file', 'subcat' );
$contTypeIndex = array_search( $cont[0], $allTypes );
$queryTypes = array_slice( $allTypes, $contTypeIndex );

And then instead of $this->select():

foreach ( $queryTypes as $type ) {
    $where = array_merge( $this->where, array( 'cl_type' => $type ) );
    $res = $dbr->select( ... );
    $rows = array_merge( $rows, iterator_to_array( $res ) );
    if ( count( $rows ) >= $limit ) {
        break;
    }
}

Changing the schema is time-consuming and disruptive, and in the light of a recent feature requirements discussion on IRC, it's not really clear to me whether this schema is sufficient to meet future demands. Also, we don't know yet what the size distribution of ICU sort keys will be, and whether 230 bytes will be enough. A stopgap measure like this would give us some time to review and design.

<Headbomb> question: has sorting changed recently?
<Headbomb> particularly, in http://en.wikipedia.org/wiki/Category:Fergie "Book:Fergie" sorts as uppercase Beta, rather than lowercase beta
...
<AryehGregor> I guess you were using it as a hack to get books to sort under their own heading, and now we've spoiled the aesthetic effect.
<Headbomb> AryehGregor: thanks
<AryehGregor> Too bad, hacks break sometimes. Pick another character that looks like a B.
<Headbomb> AryehGregor: yeah, books, templates, wikipedia were sorting under Beta, Tau, and Omega
<Headbomb> greek for B, T, and W

#Comment by Catrope (talk | contribs)   12:22, 18 March 2011

That'd certainly be possible, yes. I'll implement it.

#Comment by Catrope (talk | contribs)   16:28, 20 March 2011

Done in r84392. Your example code was very reusable (I copypasted it and had to change relatively little to make it work), thanks for that.

#Comment by Reedy (talk | contribs)   19:54, 20 March 2011

Do we need to create something to change the varchar back to an enum, as it'll possibly affect development wikis, and possibly places like TW.

Some sort of transient check based on type to force it back again?

#Comment by Catrope (talk | contribs)   19:56, 20 March 2011

There's no compelling reason to change it back, other than maybe index length overruns if cl_sortkey is enlarged in the future.

Status & tagging log