r108502 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r108501‎ | r108502 | r108503 >
Date:14:07, 10 January 2012
Author:reedy
Status:reverted (Comments)
Tags:
Comment:
Noticed looking at schema posted by Joerg in bug 33228

It seems many ancient tables (removed mainly in 1.4, and some in 1.6)

Kill them with fire, if they've still got unmigrated data in or something, we've got bigger issues!

Noticed "user_rights" table is not documented at Manual:Database_layout
Modified paths:
  • /trunk/phase3/includes/installer/MysqlUpdater.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/installer/MysqlUpdater.php
@@ -192,6 +192,7 @@
193193 array( 'modifyField', 'user', 'ug_group', 'patch-ug_group-length-increase.sql' ),
194194 array( 'addField', 'uploadstash', 'us_chunk_inx', 'patch-uploadstash_chunk.sql' ),
195195 array( 'addfield', 'job', 'job_timestamp', 'patch-jobs-add-timestamp.sql' ),
 196+ array( 'dropAncientTables' ),
196197 );
197198 }
198199
@@ -853,4 +854,25 @@
854855 $this->applyPatch( 'patch-user-newtalk-timestamp-null.sql' );
855856 $this->output( "done.\n" );
856857 }
 858+
 859+ protected function dropAncientTables() {
 860+ $ancientTables = array(
 861+ 'blobs', // 1.4
 862+ 'brokenlinks', // 1.4
 863+ 'cur', // 1.4
 864+ 'ip_blocks_old', // Temporary in 1.6
 865+ 'links', // 1.4
 866+ 'linkscc', // 1.4
 867+ 'old', // 1.4
 868+ 'trackback', // 1.19
 869+ 'user_rights', // 1.5
 870+ 'validate', // 1.6
 871+ );
 872+
 873+ foreach( $ancientTables as $table ) {
 874+ if ( $this->db->tableExists( $table, __METHOD__ ) ) {
 875+ $this->db->dropTable( $table, __METHOD__ );
 876+ }
 877+ }
 878+ }
857879 }

Follow-up revisions

RevisionCommit summaryAuthorDate
r108504Cleanup "text" (nee old) table, dropping old indexes and fields...reedy15:04, 10 January 2012
r108527Reverted r108502, r108504 per CRaaron18:05, 10 January 2012

Comments

#Comment by Dantman (talk | contribs)   14:11, 10 January 2012

Is this really a good idea? I've always thought it was better to leave stray data we don't understand lying around in tables we don't touch than to assume we can simply delete it. ie: If it doesn't hurt MW's functioning then it's better if we leave the data alone so that there's a reference for manual migration if we're not going to migrate automatically.

The only kind of table I think we should permit to be auto-deleted might be the tables we have that are essentially purely cache tables like objectcache or whatever it was and things like the links tables which simply reference data already in the database elsewhere.

#Comment by Hashar (talk | contribs)   17:04, 10 January 2012

Deleting trackback will cause data lost to people using this feature! Just leave them ?

#Comment by Reedy (talk | contribs)   17:14, 10 January 2012

It just feels icky, leaving these around

Taking 1.6/1.7, these tables have been sitting around for 5.5 years, and getting on for 7 years for 1.4

I would agree that aggressively killing the trackbacks table in 1.19 seems daft and too forward (though, see if there are any rows, if not, get rid, no data loss)

#Comment by RobLa-WMF (talk | contribs)   17:52, 10 January 2012

Please revert this. This isn't the type of thing we should be screwing with now.

#Comment by Duplicatebug (talk | contribs)   18:26, 10 January 2012

You can drop old tables, when you are checking first, if the table is empty. Then the table is unneeded and useless. But dropping tables with data is a bad idea. That is right.

#Comment by Reedy (talk | contribs)   18:49, 10 January 2012

But when they haven't been referenced in 5+ years of trunk? Granted, just a recent upgrade might still need it

I don't see a point in keeping ancient out of date data "just in case". It's a stupid reason. It's also a bit late to be noticing that something we did half a decade ago had issues. Chances are, that we didn't need that data anyway

But just because they aren't empty, doesn't that the data has been successfully migrated and isn't already usable in the transformed/upgraded/renamed tables

For enwiki, for example, what if we kept all these old tables around, how much data duplication is that? What's the point of keeping it for no gain?


Read this, for example: http://eandt.theiet.org/magazine/2011/12/dump-that-data.cfm


I can somewhat see Roblas point of "don't do this now", which makes sense, but similarly, there'd be no more/less issue doing it at the start of 1.20. The data still hasn't been used in X time. Certainly dropping trackback in the same release as it was deprecated/removed is gung-ho, and that's fair enough

#Comment by RobLa-WMF (talk | contribs)   19:44, 10 January 2012

This is adding to reviewer load at a time when we shouldn't be doing that. Do we really have to have this discussion now?

#Comment by Catrope (talk | contribs)   19:45, 10 January 2012

Maybe we can just move this to 1.20?

#Comment by Reedy (talk | contribs)   19:52, 10 January 2012

My plan was to put it back in 1.20!

Rob: I wasn't suggesting (or trying to!) arguing about it now, I perfectly agree. I just mean in a technical point of view, it makes no difference because the tables aren't used now. From the CR backlog PoV, it's valid enough reason to revert it for now. It was somewhat of a "and another thing" after the points above

If Aaron hadn't reverted it (when I was away for my Dinner), I would've done it when I got back

#Comment by Dantman (talk | contribs)   20:01, 10 January 2012

I still don't want to see this come back, even in 1.20.

The data's existence does not harm MediaWiki's functioning. If this data is not migrated by the installer then this would delete the very data remaining that a dba could use to fix the issue.

If we do start getting complaints "Uhm, when I upgraded from 1.x to 1.y all the data in <x> feature disappeared" (people don't upgrade from old versions often) and we say "Looks like we never bothered to migrate that data, we'll have to add a new migration", we don't want to actually have to say "Looks like we never bothered to migrate that data, however because Reedy introduced a migration that deletes old data all that data you had has been permanently deleted and we cannot fix the issue with your wiki even by patching the issue and issuing a new release of MediaWiki."

We're supposed to try and aim for more reliability for upgrading from old versions of MediaWiki to newer ones, not less. Aiming for less will just give people more excuses not to upgrade, and that's not what we want.

#Comment by Reedy (talk | contribs)   20:02, 10 January 2012

The migration will have occured in steps previous, not that we never bothered to revert.

It's stuff laying around from the time of the migrations

#Comment by Duplicatebug (talk | contribs)   20:29, 10 January 2012

You can always automatically drop old outdated empty tables, because there are no backup for some thing.

But tables with data maybe need migration or there are backup for some things or tables from other programms running in the same database. When the migration does not remove the data after the sucessfully migration, than it is not possible to determine, if you can automatically drop table without data lost. Write a Release Note, that this needs manually work.

#Comment by Reedy (talk | contribs)   20:59, 10 January 2012

A separate maintenance script ( "deleteAncientTables" or similar, with a RELEASE-NOTES entry) might be a more sensible path forward

#Comment by Catrope (talk | contribs)   21:09, 10 January 2012

+1. And let's slap a big flashy warning on it telling users it'll permanently destroy any old data they might have in there.

#Comment by Reedy (talk | contribs)   21:11, 10 January 2012

Sounds good. Best wait till 1.20 for committing it though ;)

Status & tagging log