r45431 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r45430‎ | r45431 | r45432 >
Date:02:10, 6 January 2009
Author:valhallasw
Status:reverted (Comments)
Tags:
Comment:
Updated deleteLinksFromNonexistent function:
- refreshLinks.inc:
* New algorithm, conform Brions description in bug #16112; instead of one big delete, it is split up in blocks of (by default) 100 incorrect page_ids to remove.
* Added function parameters

- refreshLinks.php
* New command-line parameter to set the number of page_ids to clean per batch.
* Re-instated deleteLinksFromNonexistent run
Modified paths:
  • /trunk/phase3/maintenance/refreshLinks.inc (modified) (history)
  • /trunk/phase3/maintenance/refreshLinks.php (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/refreshLinks.inc
@@ -136,13 +136,23 @@
137137 $dbw->immediateCommit();
138138 }
139139
140 -function deleteLinksFromNonexistent( $maxLag = 0 ) {
 140+/*
 141+ * Removes non-existing links from pages from pagelinks, imagelinks,
 142+ * categorylinks, templatelinks and externallinks tables.
 143+ *
 144+ * @param $maxLag
 145+ * @param $batchSize The size of deletion batches
 146+ *
 147+ * @author Merlijn van Deen <valhallasw@arctus.nl>
 148+ */
 149+function deleteLinksFromNonexistent( $maxLag = 0, $batchSize = 100 ) {
141150 $fname = 'deleteLinksFromNonexistent';
142 -
143151 wfWaitForSlaves( $maxLag );
144 -
 152+
145153 $dbw = wfGetDB( DB_MASTER );
146 -
 154+ $dbr = wfGetDB( DB_SLAVE );
 155+ $dbr->bufferResults(false);
 156+
147157 $linksTables = array(
148158 'pagelinks' => 'pl_from',
149159 'imagelinks' => 'il_from',
@@ -150,27 +160,65 @@
151161 'templatelinks' => 'tl_from',
152162 'externallinks' => 'el_from',
153163 );
154 -
155 - $page = $dbw->tableName( 'page' );
156 -
157 -
 164+
 165+
 166+ $readPage = $dbr->tableName( 'page' );
158167 foreach ( $linksTables as $table => $field ) {
159 - if ( !$dbw->ping() ) {
160 - print "DB disconnected, reconnecting...";
161 - while ( !$dbw->ping() ) {
162 - print ".";
163 - sleep(10);
164 - }
 168+ $readLinks = $dbr->tableName( $table );
 169+
 170+ $sql = "SELECT DISTINCT( $field ) FROM $readLinks LEFT JOIN $readPage ON $field=page_id WHERE page_id IS NULL;";
 171+ print "Retrieving illegal entries from $table: \tRUNNING";
 172+
 173+ $results = $dbr->query( $sql, $fname . ':' . $readLinks );
 174+ print "\x08\x08\x08\x08\x08\x08\x08" . $results->numRows() . " illegal " . $field. "s. ";
 175+
 176+ if ( $results->numRows() == 0 ) {
165177 print "\n";
 178+ continue;
166179 }
 180+
 181+ $counter = 0;
 182+ $list = array();
 183+ print "Removing illegal links: 1..";
 184+ foreach( $results as $row ) {
 185+ $counter++;
 186+ $list[] = $row->$field;
 187+ if ( ( $counter % $batchSize ) == 0 ) {
 188+ print $counter . "..";
 189+ deleteBatch($dbw, $table, $field, $list);
 190+ $list = '';
 191+ }
 192+ }
 193+ print $counter . "\n";
 194+ deleteBatch($dbw, $table, $field, $list);
 195+ }
 196+}
167197
168 - $pTable = $dbw->tableName( $table );
169 - $sql = "DELETE $pTable FROM $pTable LEFT JOIN $page ON page_id=$field WHERE page_id IS NULL";
 198+/* Deletes a batch of items from a table.
 199+ * Runs the query: DELETE FROM <$table> WHERE <$field> IN (<$list>)
 200+ *
 201+ * @param $dbw Database Database object to run the DELETE query on
 202+ * @param $table table to work on; will be converted via $dbw->tableName.
 203+ * @param $field column to search in
 204+ * @param $list values to remove. Array with SQL-safe (!) values.
 205+ *
 206+ * @author Merlijn van Deen <valhallasw@arctus.nl>
 207+ */
 208+function deleteBatch($dbw, $table, $field, $list) {
 209+ if (count($list) == 0) return;
 210+
 211+ $masterLinks = $dbw->tableName( $table );
 212+ $fname = "deleteBatch:masterLinks";
 213+
 214+ if ( !$dbw->ping() ) {
 215+ print "\nDB disconnected, reconnecting...";
 216+ while ( !$dbw->ping() ) {
 217+ print ".";
 218+ sleep(10);
 219+ }
 220+ print "\n";
 221+ }
170222
171 - print "Deleting $table from non-existent articles...";
172 - $dbw->query( $sql, $fname );
173 - print " fixed " .$dbw->affectedRows() . " row(s)\n";
174 - }
 223+ $sql = "DELETE FROM $masterLinks WHERE $field IN (" . join("," , $list) . ");";
 224+ $dbw->query($sql, $fname);
175225 }
176 -
177 -?>
Index: trunk/phase3/maintenance/refreshLinks.php
@@ -18,14 +18,16 @@
1919 [--new-only] [--redirects-only]
2020 php refreshLinks.php [<start>] [-e <end>] [-m <maxlag>] --old-redirects-only
2121
22 - --help : This help message
23 - --dfn-only : Delete links from nonexistent articles only
24 - --new-only : Only affect articles with just a single edit
25 - --redirects-only : Only fix redirects, not all links
26 - --old-redirects-only : Only fix redirects with no redirect table entry
27 - -m <number> : Maximum replication lag
28 - <start> : First page id to refresh
29 - -e <number> : Last page id to refresh
 22+ --help : This help message
 23+ --dfn-only : Delete links from nonexistent articles only
 24+ --batch-size <number> : The delete batch size when removing links from
 25+ nonexistent articles (default 100)
 26+ --new-only : Only affect articles with just a single edit
 27+ --redirects-only : Only fix redirects, not all links
 28+ --old-redirects-only : Only fix redirects with no redirect table entry
 29+ -m <number> : Maximum replication lag
 30+ <start> : First page id to refresh
 31+ -e <number> : Last page id to refresh
3032
3133 TEXT;
3234 exit(0);
@@ -44,10 +46,8 @@
4547 }
4648 // this bit's bad for replication: disabling temporarily
4749 // --brion 2005-07-16
48 -//deleteLinksFromNonexistent();
 50+deleteLinksFromNonexistent($options['m'], $options['batch-size']);
4951
5052 if ( $options['globals'] ) {
5153 print_r( $GLOBALS );
5254 }
53 -
54 -

Follow-up revisions

RevisionCommit summaryAuthorDate
r45482Pull back r45431 for the moment "Updated deleteLinksFromNonexistent function:...brion03:33, 7 January 2009
r45514Recommit of r45431 with these changes:...valhallasw19:51, 7 January 2009

Comments

#Comment by Danny B. (talk | contribs)   04:42, 6 January 2009

In the help text, there should be an example of command with --batch-size parameter.

And shouldn't there be either --batch-size=<number> or batch-size in $optionsWithArgs?

#Comment by Brion VIBBER (talk | contribs)   03:33, 7 January 2009

There's some funny output with \x08 stuff, and I don't want to fiddle with it just now...

Reverting for now in r45482

#Comment by Valhallasw (talk | contribs)   20:37, 7 January 2009

Removed backspaces in output in r45514 Added batch-size to the example and $optionWithArgs array in r45516

Sorry for that double commit :)

Status & tagging log