r94289 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r94288‎ | r94289 | r94290 >
Date:21:52, 11 August 2011
Author:aaron
Status:reverted (Comments)
Tags:
Comment:
* Added rev_sha1 and ar_sha1 columns to revision/archive tables (useful for bug 25312)
* Created a script to populate these fields (doesn't handle archive rows without ar_rev_id set though)
Modified paths:
  • /trunk/phase3/includes/AutoLoader.php (modified) (history)
  • /trunk/phase3/includes/installer/MysqlUpdater.php (modified) (history)
  • /trunk/phase3/maintenance/archives/patch-ar_sha1.sql (added) (history)
  • /trunk/phase3/maintenance/archives/patch-rev_sha1.sql (added) (history)
  • /trunk/phase3/maintenance/populateRevisionSha1.php (added) (history)
  • /trunk/phase3/maintenance/tables.sql (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/archives/patch-ar_sha1.sql
@@ -0,0 +1,3 @@
 2+-- Adding ar_sha1 field
 3+ALTER TABLE /*$wgDBprefix*/archive
 4+ ADD ar_sha1 varbinary(32) NOT NULL default '';
Property changes on: trunk/phase3/maintenance/archives/patch-ar_sha1.sql
___________________________________________________________________
Added: svn:eol-style
15 + native
Index: trunk/phase3/maintenance/archives/patch-rev_sha1.sql
@@ -0,0 +1,3 @@
 2+-- Adding rev_sha1 field
 3+ALTER TABLE /*$wgDBprefix*/revision
 4+ ADD rev_sha1 varbinary(32) NOT NULL default '';
Property changes on: trunk/phase3/maintenance/archives/patch-rev_sha1.sql
___________________________________________________________________
Added: svn:eol-style
15 + native
Index: trunk/phase3/maintenance/populateRevisionSha1.php
@@ -0,0 +1,96 @@
 2+<?php
 3+/**
 4+ * Fills the rev_sha1 and ar_sha1 columns of revision & archive tables.
 5+ *
 6+ * This program is free software; you can redistribute it and/or modify
 7+ * it under the terms of the GNU General Public License as published by
 8+ * the Free Software Foundation; either version 2 of the License, or
 9+ * (at your option) any later version.
 10+ *
 11+ * This program is distributed in the hope that it will be useful,
 12+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
 13+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 14+ * GNU General Public License for more details.
 15+ *
 16+ * You should have received a copy of the GNU General Public License along
 17+ * with this program; if not, write to the Free Software Foundation, Inc.,
 18+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 19+ * http://www.gnu.org/copyleft/gpl.html
 20+ *
 21+ * @ingroup Maintenance
 22+ */
 23+
 24+require_once( dirname( __FILE__ ) . '/Maintenance.php' );
 25+
 26+class PopulateRevisionSha1 extends Maintenance {
 27+ public function __construct() {
 28+ parent::__construct();
 29+ $this->mDescription = "Populates the rev_sha1 and ar_sha1 fields";
 30+ $this->setBatchSize( 150 );
 31+ }
 32+
 33+ public function execute() {
 34+ $db = wfGetDB( DB_MASTER );
 35+
 36+ $this->output( "Populating rev_len column\n" );
 37+ $this->doSha1Updates( $db, 'revision', 'rev_id', 'rev' );
 38+
 39+ $this->output( "Populating ar_len column\n" );
 40+ $this->doSha1Updates( $db, 'archive', 'ar_rev_id', 'ar' );
 41+
 42+ if ( $db->insert(
 43+ 'updatelog',
 44+ array( 'ul_key' => 'populate rev_sha1' ),
 45+ __METHOD__,
 46+ 'IGNORE'
 47+ )
 48+ ) {
 49+ $this->output( "rev_sha1 and ar_sha1 population complete.\n" );
 50+ return true;
 51+ } else {
 52+ $this->output( "Could not insert rev_sha1 population row.\n" );
 53+ return false;
 54+ }
 55+ }
 56+
 57+ protected function doSha1Updates( $db, $table, $idCol, $prefix ) {
 58+ $start = $db->selectField( $table, "MIN($idCol)", "$idCol IS NOT NULL", __METHOD__ );
 59+ if ( !$start ) {
 60+ $this->output( "Nothing to do.\n" );
 61+ return true;
 62+ }
 63+ $end = $db->selectField( $table, "MAX($idCol)", "$idCol IS NOT NULL", __METHOD__ );
 64+
 65+ # Do remaining chunk
 66+ $end += $this->mBatchSize - 1;
 67+ $blockStart = $start;
 68+ $blockEnd = $start + $this->mBatchSize - 1;
 69+ while ( $blockEnd <= $end ) {
 70+ $this->output( "...doing $idCol from $blockStart to $blockEnd\n" );
 71+ $cond = "$idCol BETWEEN $blockStart AND $blockEnd
 72+ AND $idCol IS NOT NULL AND {$prefix}_sha1 IS NOT NULL";
 73+ $res = $db->select( $table, '*', $cond, __METHOD__ );
 74+
 75+ $db->begin();
 76+ foreach ( $res as $row ) {
 77+ if ( $table === 'archive' ) {
 78+ $rev = Revision::newFromArchiveRow( $row );
 79+ } else {
 80+ $rev = new Revision( $row );
 81+ }
 82+ $db->update( $table,
 83+ array( "{$prefix}_sha1" => Revision::base36Sha1( $rev->getRawText() ) ),
 84+ array( $idCol => $row->$idCol ),
 85+ __METHOD__ );
 86+ }
 87+ $db->commit();
 88+
 89+ $blockStart += $this->mBatchSize;
 90+ $blockEnd += $this->mBatchSize;
 91+ wfWaitForSlaves();
 92+ }
 93+ }
 94+}
 95+
 96+$maintClass = "PopulateRevisionSha1";
 97+require_once( RUN_MAINTENANCE_IF_MAIN );
Property changes on: trunk/phase3/maintenance/populateRevisionSha1.php
___________________________________________________________________
Added: svn:eol-style
198 + native
Index: trunk/phase3/maintenance/tables.sql
@@ -317,8 +317,11 @@
318318
319319 -- Key to revision.rev_id
320320 -- This field is used to add support for a tree structure (The Adjacency List Model)
321 - rev_parent_id int unsigned default NULL
 321+ rev_parent_id int unsigned default NULL,
322322
 323+ -- SHA-1 text content hash in base-36
 324+ rev_sha1 varbinary(32) NOT NULL default ''
 325+
323326 ) /*$wgDBTableOptions*/ MAX_ROWS=10000000 AVG_ROW_LENGTH=1024;
324327 -- In case tables are created as MyISAM, use row hints for MySQL <5.0 to avoid 4GB limit
325328
@@ -424,7 +427,10 @@
425428 ar_page_id int unsigned,
426429
427430 -- Original previous revision
428 - ar_parent_id int unsigned default NULL
 431+ ar_parent_id int unsigned default NULL,
 432+
 433+ -- SHA-1 text content hash in base-36
 434+ ar_sha1 varbinary(32) NOT NULL default ''
429435 ) /*$wgDBTableOptions*/;
430436
431437 CREATE INDEX /*i*/name_title_timestamp ON /*_*/archive (ar_namespace,ar_title,ar_timestamp);
Index: trunk/phase3/includes/installer/MysqlUpdater.php
@@ -186,6 +186,9 @@
187187 // 1.19
188188 array( 'addTable', 'config', 'patch-config.sql' ),
189189 array( 'addIndex', 'logging', 'type_action', 'patch-logging-type-action-index.sql'),
 190+ array( 'addField', 'revision', 'rev_sha1', 'patch-rev_sha1.sql' ),
 191+ array( 'addField', 'archive', 'ar_sha1', 'patch-ar_sha1.sql' ),
 192+ array( 'doPopulateRevSha1' )
190193 );
191194 }
192195
@@ -855,4 +858,14 @@
856859 $this->applyPatch( 'patch-user-newtalk-timestamp-null.sql' );
857860 $this->output( "done.\n" );
858861 }
 862+
 863+ protected function doPopulateRevSha1() {
 864+ if ( $this->updateRowExists( 'populate rev_sha1' ) ) {
 865+ $this->output( "...rev_sha1/ar_sha1 columns already populated.\n" );
 866+ return;
 867+ }
 868+
 869+ $task = $this->maintenance->runChild( 'PopulateRevisionSha1' );
 870+ $task->execute();
 871+ }
859872 }
Index: trunk/phase3/includes/AutoLoader.php
@@ -838,6 +838,7 @@
839839 'PopulateLogUsertext' => 'maintenance/populateLogUsertext.php',
840840 'PopulateParentId' => 'maintenance/populateParentId.php',
841841 'PopulateRevisionLength' => 'maintenance/populateRevisionLength.php',
 842+ 'PopulateRevisionSha1' => 'maintenance/PopulateRevisionSha1.php',
842843 'SevenZipStream' => 'maintenance/7zip.inc',
843844 'Sqlite' => 'maintenance/sqlite.inc',
844845 'UpdateCollation' => 'maintenance/updateCollation.php',

Follow-up revisions

RevisionCommit summaryAuthorDate
r94290Follow-up r94289: code changes to fill the new fields on insertion and select...aaron21:54, 11 August 2011
r94294Fix case of the new file added in r94289...platonides22:32, 11 August 2011
r94333Fix copy-paste mistake in r94289catrope10:00, 12 August 2011
r94345Follow-up r94289: SQLite support, unbreaks testsmaxsem13:16, 12 August 2011
r94362Fix for r94289: we want to skip rows with non-empty sha1, not non-NULL (which...aaron16:50, 12 August 2011
r94370* Added LoggedUpdateMaintenance subclass...aaron19:11, 12 August 2011
r94541Revert r94289, r94290, r94294, r94333, r94345, r94362, r94370 -- core schema ...brion18:24, 15 August 2011
r94547Followup r94541 (reverts of r94289 undiscussed core schema change and followu...brion18:52, 15 August 2011
r101021Reinstated r94289 et all - rev_sha1/ar_sha1 field for bug 21860aaron18:44, 27 October 2011

Comments

#Comment by Aaron Schulz (talk | contribs)   22:52, 11 August 2011

PG and SQLite still need updates.

#Comment by Krinkle (talk | contribs)   07:44, 12 August 2011

Brakes the unit tests for upgrades under SQLite:

DatabaseSqliteTest::testUpgrades
Mismatching columns for table "archive" upgrading from 1.15 to 1.19alpha
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
     [8] => ar_rev_id
-    [9] => ar_sha1
-    [10] => ar_text
-    [11] => ar_text_id
-    [12] => ar_timestamp
-    [13] => ar_title
-    [14] => ar_user
-    [15] => ar_user_text
+    [9] => ar_text
+    [10] => ar_text_id
+    [11] => ar_timestamp
+    [12] => ar_title
+    [13] => ar_user
+    [14] => ar_user_text
 )

/home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/includes/db/DatabaseSqliteTest.php:218
/home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/MediaWikiTestCase.php:64
/home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/MediaWikiPHPUnitCommand.php:20
/home/ci/cruisecontrol-bin-2.8.3/projects/mw/source/tests/phpunit/phpunit.php:60

From your previous I comment I get that they're not added yet so this may be because of that, reporting here just in case.

#Comment by Catrope (talk | contribs)   10:31, 12 August 2011
+				AND $idCol IS NOT NULL AND {$prefix}_sha1 IS NOT NULL";

Since rev_sha1 and ar_sha1 are declared as NOT NULL, an IS NOT NULL condition on them is pointless.

Looks good to me otherwise, but I want Chad to look at how the updater calls the population script.

#Comment by 😂 (talk | contribs)   13:35, 12 August 2011

Updater is fine with r94345.

#Comment by Aaron Schulz (talk | contribs)   17:14, 12 August 2011

NULL check fixed in r94362.

#Comment by Krinkle (talk | contribs)   13:15, 12 August 2011

Running MySQL locally. Geting this on wiki pages:

A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:
(SQL query hidden)
from within function "Revision::fetchFromConds". Database returned error "1054: Unknown column 'rev_sha1' in 'field list' (127.0.0.1)".

And from update.php:


Populating rev_len column
...doing rev_id from 1 to 200
A database query syntax error has occurred.
The last attempted database query was:
"SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1  FROM `revision`  WHERE (rev_id >= 1) AND (rev_id <= 200) AND (rev_len IS NULL)  "
from within function "PopulateRevisionLength::execute".
Database returned error "1054: Unknown column 'rev_sha1' in 'field list' (127.0.0.1)"
#Comment by Catrope (talk | contribs)   13:42, 12 August 2011

We figured out why on IRC: rev_len isn't populated yet on his wiki, so the rev_len population script runs before rev_sha1 is added, and barfs.

Chad says: "Really, populateRevLen should be moved to $postUpdateMaintenance."

#Comment by Krinkle (talk | contribs)   13:52, 12 August 2011

For what it's worth, that column does exist on my wiki and all rows in the revision table have a value in that column.

According to Roan, the reason it's trying to update is because it's not in updatelog.

My wiki runs trunk/phase3, revision: 94346 (first installed about 2 weeks ago).

#Comment by Catrope (talk | contribs)   13:08, 15 August 2011

This has been filed by someone else as bug 30383

#Comment by Krinkle (talk | contribs)   13:21, 12 August 2011

So to populate this hash column, the maintenance script would have to fetch the raw wikitext of all public and deleted revisions of all wikis.

That's going to open up a lot of possibilities (not sure how useful it is, but more re-using of text_oldids comes to mind when an edit results in the page text being equal to an earlier revision), but I'm curious how long that would take though on WMF. Weeks ?

#Comment by Krinkle (talk | contribs)   13:30, 12 August 2011

Hm.. I'm wondering though, why is this in the revision table and not in text table ? I'm guessing something with reachability through queries and indices, correct ?

#Comment by Brion VIBBER (talk | contribs)   18:19, 15 August 2011

Core table change with no discussion on wikitech-l? Needs immediate revert.

#Comment by Brion VIBBER (talk | contribs)   18:24, 15 August 2011

Reverted in r94541.

#Comment by MZMcBride (talk | contribs)   19:15, 15 August 2011

A bit of discussion at bug 21860 and bug 25312. I'm not sure the approval process for core schema changes is documented anywhere.

#Comment by Drdee (talk | contribs)   09:21, 3 September 2011

I have posted a first draft of a proposal on how to deploy this at http://strategy.wikimedia.org/wiki/Proposal:Implement_and_deploy_checksum_revision_table

Status & tagging log