r96722 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r96721‎ | r96722 | r96723 >
Date:15:26, 10 September 2011
Author:catrope
Status:old (Comments)
Tags:aft, schema 
Comment:
Epic followup to r94404, r94509: change the ArticleFeedback schema such that locking is no longer necessary. This should fix the counter drift problems, too. Thanks to Tim for pointing me in this direction.

* Change the schema of article_feedback to no longer have a primary key that ensures one rating per user per revision, but to have a primary key on the auto-incremented field aa_id instead
** This allows multiple ratings per user per revision in the article_feedback table, which makes the counter drift fix possible
** This schema change is so radical that the patch has to create another table with the new schema, copy the data over, and swap the tables
* Changed ApiArticleFeedback::insertUserRatings() to insert all the ratings rows at once, using one query. This ensures they get consecutive aa_id values
* Moved the SELECT in ApiArticleFeedback after the insertUserRatings() call, and added an "aa_id < $id" clause so it always looks at the rating set right before the one that was just inserted, even if another thread has already inserted another rating set in the meantime. This ensures that each rating delta is handled by exactly one thread, which should fix counter drift.
* Removed the useless and broken behavior of decrementing values in article_feedback_revisions. It was implemented inconsistently and never worked properly for that reason. This makes insertRevisionRating() much simpler
* Due to the insertRevisionRating() change. $lastRevision is no longer needed. To properly update the revision delta, though, we still need to look at whether the previous rating's revid matches this one's, and only take that rating into account if it does. The previous logic was probably broken and, now that I think about it, was probably the cause of all the weird behavior, but it's too late to go back now. I'll patch 1.17wmf1 for just this problem later, see if that helps
* Take this opportunity to redo the indexes on article_feedback so the queries run on it are actually indexed properly. Also drop useless aa_user_id conditions from queries.
Modified paths:
  • /trunk/extensions/ArticleFeedback/ArticleFeedback.hooks.php (modified) (history)
  • /trunk/extensions/ArticleFeedback/api/ApiArticleFeedback.php (modified) (history)
  • /trunk/extensions/ArticleFeedback/api/ApiQueryArticleFeedback.php (modified) (history)
  • /trunk/extensions/ArticleFeedback/sql/ArticleFeedback.sql (modified) (history)
  • /trunk/extensions/ArticleFeedback/sql/RecreatePK.sql (added) (history)

Diff [purge]

Index: trunk/extensions/ArticleFeedback/sql/RecreatePK.sql
@@ -0,0 +1,38 @@
 2+-- Add aa_id as the new primary key to article_feedback and drop the old one.
 3+-- Also change the indexing while we're at it
 4+
 5+-- In order to safely change the primary key even in replicated environments,
 6+-- we have to create a new table with the new structure, copy over the data,
 7+-- then rename the table. This is to ensure that the values of aa_id are
 8+-- consistent across all slaves.
 9+
 10+-- Create new table
 11+-- Would've used CREATE TABLE ... LIKE here but SQLite doesn't support ALTER TABLE ... DROP PRIMARY KEY
 12+-- so we're stuck with duplicating the table structure.
 13+CREATE TABLE /*_*/article_feedback2 (
 14+ aa_id integer unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
 15+ aa_page_id integer unsigned NOT NULL,
 16+ aa_user_id integer NOT NULL,
 17+ aa_user_text varbinary(255) NOT NULL,
 18+ aa_user_anon_token varbinary(32) NOT NULL DEFAULT '',
 19+ aa_revision integer unsigned NOT NULL,
 20+ aa_timestamp binary(14) NOT NULL DEFAULT '',
 21+ aa_rating_id int unsigned NOT NULL,
 22+ aa_rating_value int unsigned NOT NULL,
 23+ aa_design_bucket int unsigned NOT NULL DEFAULT 0
 24+) /*$wgDBTableOptions*/;
 25+CREATE INDEX /*i*/aa_page_user_token_id ON /*_*/article_feedback2 (aa_page_id, aa_user_text, aa_user_anon_token, aa_id);
 26+CREATE INDEX /*i*/aa_revision ON /*_*/article_feedback2 (aa_revision);
 27+CREATE INDEX /*i*/article_feedback_timestamp ON /*_*/article_feedback2 (aa_timestamp);
 28+
 29+-- Copy the data, ordered by the old primary key
 30+-- Need to specify the fields explicitly to avoid confusion with aa_id
 31+INSERT INTO /*_*/article_feedback2
 32+ (aa_page_id, aa_user_id, aa_user_text, aa_user_anon_token, aa_revision, aa_timestamp, aa_rating_id, aa_rating_value, aa_design_bucket)
 33+ SELECT aa_page_id, aa_user_id, aa_user_text, aa_user_anon_token, aa_revision, aa_timestamp, aa_rating_id, aa_rating_value, aa_design_bucket
 34+ FROM /*_*/article_feedback
 35+ ORDER BY aa_revision, aa_user_text, aa_rating_id, aa_user_anon_token;
 36+
 37+-- Drop the old table and rename the new table to the old name
 38+DROP TABLE /*_*/article_feedback;
 39+ALTER TABLE /*_*/article_feedback2 RENAME TO /*_*/article_feedback;
Property changes on: trunk/extensions/ArticleFeedback/sql/RecreatePK.sql
___________________________________________________________________
Added: svn:eol-style
140 + native
Index: trunk/extensions/ArticleFeedback/sql/ArticleFeedback.sql
@@ -14,6 +14,8 @@
1515
1616 -- Store article feedbacks (user rating per revision)
1717 CREATE TABLE IF NOT EXISTS /*_*/article_feedback (
 18+ -- Row ID (primary key)
 19+ aa_id integer unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
1820 -- Foreign key to page.page_id
1921 aa_page_id integer unsigned NOT NULL,
2022 -- User Id (0 if anon)
@@ -31,11 +33,10 @@
3234 -- Value of the rating (0 is "unrated", else 1-5)
3335 aa_rating_value int unsigned NOT NULL,
3436 -- Which rating widget the user was given. Default of 0 is the "old" design
35 - aa_design_bucket int unsigned NOT NULL DEFAULT 0,
36 - -- 1 vote per user per revision
37 - PRIMARY KEY (aa_revision, aa_user_text, aa_rating_id, aa_user_anon_token)
 37+ aa_design_bucket int unsigned NOT NULL DEFAULT 0
3838 ) /*$wgDBTableOptions*/;
39 -CREATE INDEX /*i*/aa_user_page_revision ON /*_*/article_feedback (aa_user_id, aa_page_id, aa_revision);
 39+CREATE INDEX /*i*/aa_page_user_token_id ON /*_*/article_feedback (aa_page_id, aa_user_text, aa_user_anon_token, aa_id);
 40+CREATE INDEX /*i*/aa_revision ON /*_*/article_feedback (aa_revision);
4041 -- Create an index on the article_feedback.aa_timestamp field
4142 CREATE INDEX /*i*/article_feedback_timestamp ON /*_*/article_feedback (aa_timestamp);
4243
Index: trunk/extensions/ArticleFeedback/ArticleFeedback.hooks.php
@@ -230,6 +230,15 @@
231231 $dir . '/sql/AddArticleFeedbackTimestampIndex.sql',
232232 true
233233 ) );
 234+
 235+ // This change recreates the PK on a new field. Check for that new field's existence
 236+ $updater->addExtensionUpdate( array(
 237+ 'addField',
 238+ 'article_feedback',
 239+ 'aa_id',
 240+ $dir . '/sql/RecreatePK.sql',
 241+ true
 242+ ) );
234243 }
235244 return true;
236245 }
Index: trunk/extensions/ArticleFeedback/api/ApiArticleFeedback.php
@@ -38,57 +38,60 @@
3939 }
4040
4141 $dbw = wfGetDB( DB_MASTER );
42 -
43 - // Query the latest ratings by this user for this page,
 42+
 43+ $pageId = $params['pageid'];
 44+ $revisionId = $params['revid'];
 45+
 46+ // Build array( rating ID => rating value )
 47+ $ratings = array();
 48+ foreach ( $wgArticleFeedbackRatings as $ratingID ) {
 49+ $ratings[$ratingID] = intval( $params["r{$ratingID}"] );
 50+ }
 51+ // Insert the new ratings
 52+ $id = $this->insertUserRatings( $pageId, $revisionId, $wgUser, $token, $ratings, $params['bucket'] );
 53+
 54+ // Query the previous ratings by this user for this page,
4455 // possibly for an older revision
45 - // Select from the master to prevent replag-induced bugs
 56+ // Use aa_id < $id to make sure we get the one before ours even if other ratings were inserted after
 57+ // insertUserRatings() but before selectRow(); this prevents race conditions from messing things up.
4658 $res = $dbw->select(
4759 'article_feedback',
4860 array( 'aa_rating_id', 'aa_rating_value', 'aa_revision' ),
4961 array(
50 - 'aa_user_id' => $wgUser->getId(),
5162 'aa_user_text' => $wgUser->getName(),
5263 'aa_page_id' => $params['pageid'],
5364 'aa_rating_id' => $wgArticleFeedbackRatings,
5465 'aa_user_anon_token' => $token,
 66+ 'aa_id < ' . intval( $id )
5567 ),
5668 __METHOD__,
5769 array(
58 - 'ORDER BY' => 'aa_revision DESC',
 70+ 'ORDER BY' => 'aa_id DESC',
5971 'LIMIT' => count( $wgArticleFeedbackRatings ),
6072 )
6173 );
62 -
6374 $lastRatings = array();
64 -
6575 foreach ( $res as $row ) {
6676 $lastRatings[$row->aa_rating_id]['value'] = $row->aa_rating_value;
6777 $lastRatings[$row->aa_rating_id]['revision'] = $row->aa_revision;
6878 }
69 -
70 - $pageId = $params['pageid'];
71 - $revisionId = $params['revid'];
72 -
73 - foreach( $wgArticleFeedbackRatings as $rating ) {
74 - $lastRating = false;
75 - $lastRevision = false;
 79+
 80+ foreach ( $wgArticleFeedbackRatings as $rating ) {
 81+ $lastPageRating = false;
 82+ $lastRevRating = false;
7683 if ( isset( $lastRatings[$rating] ) ) {
77 - $lastRating = intval( $lastRatings[$rating]['value'] );
78 - $lastRevision = intval( $lastRatings[$rating]['revision'] );
 84+ $lastPageRating = intval( $lastRatings[$rating]['value'] );
 85+ if ( intval( $lastRatings[$rating]['revision'] ) == $revisionId ) {
 86+ $lastRevRating = $lastPageRating;
 87+ }
7988 }
80 -
81 - $thisRating = false;
82 - if ( isset( $params["r{$rating}"] ) ) {
83 - $thisRating = intval( $params["r{$rating}"] );
84 - }
85 -
86 - $this->insertRevisionRating( $pageId, $revisionId, $lastRevision, $rating, ( $thisRating - $lastRating ),
87 - $thisRating, $lastRating
88 - );
 89+ $thisRating = intval( $params["r{$rating}"] );
8990
90 - $this->insertPageRating( $pageId, $rating, ( $thisRating - $lastRating ), $thisRating, $lastRating );
91 -
92 - $this->insertUserRatings( $pageId, $revisionId, $wgUser, $token, $rating, $thisRating, $params['bucket'] );
 91+ // Update counter tables
 92+ $this->insertRevisionRating( $pageId, $revisionId, $rating, $thisRating - $lastRevRating,
 93+ $thisRating, $lastRevRating
 94+ );
 95+ $this->insertPageRating( $pageId, $rating, $thisRating - $lastPageRating, $thisRating, $lastPageRating );
9396 }
9497
9598 $this->insertProperties( $revisionId, $wgUser, $token, $params );
@@ -157,13 +160,12 @@
158161 *
159162 * @param $pageId Integer: Page Id
160163 * @param $revisionId Integer: Revision Id
161 - * @param $lastRevision Integer: Revision Id of last rating
162164 * @param $ratingId Integer: Rating Id
163165 * @param $updateAddition Integer: Difference between user's last rating (if applicable)
164166 * @param $thisRating Integer|Boolean: Value of the Rating
165167 * @param $lastRating Integer|Boolean: Value of the last Rating
166168 */
167 - private function insertRevisionRating( $pageId, $revisionId, $lastRevision, $ratingId, $updateAddition, $thisRating, $lastRating ) {
 169+ private function insertRevisionRating( $pageId, $revisionId, $ratingId, $updateAddition, $thisRating, $lastRating ) {
168170 $dbw = wfGetDB( DB_MASTER );
169171
170172 // Try to insert a new "totals" row for this page,rev,rating set
@@ -179,58 +181,21 @@
180182 __METHOD__,
181183 array( 'IGNORE' )
182184 );
183 -
184 - // If that succeded in inserting a row, then we are for sure rating a previously unrated
185 - // revision, and we need to add more information about this rating to the new "totals" row,
186 - // as well as remove the previous rating values from the previous "totals" row
187 - if ( $dbw->affectedRows() ) {
188 - // If there was a previous rating, there should be a "totals" row for it's revision
189 - if ( $lastRating ) {
190 - // Deduct the previous rating values from the previous "totals" row
191 - $dbw->update(
192 - 'article_feedback_revisions',
193 - array(
194 - 'afr_total = afr_total - ' . intval( $lastRating ),
195 - 'afr_count = afr_count - 1',
196 - ),
197 - array(
198 - 'afr_page_id' => $pageId,
199 - 'afr_rating_id' => $ratingId,
200 - 'afr_revision' => $lastRevision
201 - ),
202 - __METHOD__
203 - );
204 - }
205 - // Add this rating's values to the new "totals" row
206 - $dbw->update(
207 - 'article_feedback_revisions',
208 - array(
209 - 'afr_total' => $thisRating,
210 - 'afr_count' => $thisRating ? 1 : 0,
211 - ),
212 - array(
213 - 'afr_page_id' => $pageId,
214 - 'afr_rating_id' => $ratingId,
215 - 'afr_revision' => $revisionId,
216 - ),
217 - __METHOD__
218 - );
219 - } else {
220 - // Apply the difference between the previous and new ratings to the current "totals" row
221 - $dbw->update(
222 - 'article_feedback_revisions',
223 - array(
224 - 'afr_total = afr_total + ' . $updateAddition,
225 - 'afr_count = afr_count + ' . $this->getCountChange( $lastRating, $thisRating ),
226 - ),
227 - array(
228 - 'afr_page_id' => $pageId,
229 - 'afr_rating_id' => $ratingId,
230 - 'afr_revision' => $revisionId,
231 - ),
232 - __METHOD__
233 - );
234 - }
 185+
 186+ // Apply the difference between the previous and new ratings to the current "totals" row
 187+ $dbw->update(
 188+ 'article_feedback_revisions',
 189+ array(
 190+ 'afr_total = afr_total + ' . $updateAddition,
 191+ 'afr_count = afr_count + ' . $this->getCountChange( $lastRating, $thisRating ),
 192+ ),
 193+ array(
 194+ 'afr_page_id' => $pageId,
 195+ 'afr_rating_id' => $ratingId,
 196+ 'afr_revision' => $revisionId,
 197+ ),
 198+ __METHOD__
 199+ );
235200 }
236201 /**
237202 * Calculate the difference between the previous rating and this one
@@ -247,55 +212,45 @@
248213 }
249214
250215 /**
251 - * Inserts (or Updates, where appropriate) the users ratings for a specific revision
 216+ * Inserts the user's ratings for a specific revision
252217 *
253218 * @param $pageId Integer: Page Id
254219 * @param $revisionId Integer: Revision Id
255220 * @param $user User: Current User object
256221 * @param $token Array: Token if necessary
257 - * @param $ratingId Integer: Rating Id
258 - * @param $ratingValue Integer: Value of the Rating
 222+ * @param $ratings Array: Keys are rating IDs, values are rating values
259223 * @param $bucket Integer: Which rating widget was the user shown
 224+ * @return int Return value of $dbw->insertID(). This is the aa_id of the first (MySQL) or last (SQLite) inserted row
260225 */
261 - private function insertUserRatings( $pageId, $revisionId, $user, $token, $ratingId, $ratingValue, $bucket ) {
 226+ private function insertUserRatings( $pageId, $revisionId, $user, $token, $ratings, $bucket ) {
262227 $dbw = wfGetDB( DB_MASTER );
263228
264229 $timestamp = $dbw->timestamp();
265 -
266 - $dbw->insert(
267 - 'article_feedback',
268 - array(
 230+
 231+ $rows = array();
 232+ foreach ( $ratings as $ratingID => $ratingValue ) {
 233+ $rows[] = array(
269234 'aa_page_id' => $pageId,
270235 'aa_user_id' => $user->getId(),
271236 'aa_user_text' => $user->getName(),
272237 'aa_user_anon_token' => $token,
273238 'aa_revision' => $revisionId,
274239 'aa_timestamp' => $timestamp,
275 - 'aa_rating_id' => $ratingId,
 240+ 'aa_rating_id' => $ratingID,
276241 'aa_rating_value' => $ratingValue,
277242 'aa_design_bucket' => $bucket,
278 - ),
279 - __METHOD__,
280 - array( 'IGNORE' )
281 - );
282 -
283 - if ( !$dbw->affectedRows() ) {
284 - $dbw->update(
285 - 'article_feedback',
286 - array(
287 - 'aa_timestamp' => $timestamp,
288 - 'aa_rating_value' => $ratingValue,
289 - ),
290 - array(
291 - 'aa_page_id' => $pageId,
292 - 'aa_user_text' => $user->getName(),
293 - 'aa_revision' => $revisionId,
294 - 'aa_rating_id' => $ratingId,
295 - 'aa_user_anon_token' => $token,
296 - ),
297 - __METHOD__
298243 );
299244 }
 245+
 246+ // In MySQL, there is native support for multi-row inserts and our rows
 247+ // will automatically get consecutive insertIDs. In SQLite this seems
 248+ // to be the case if you use a transaction, but I couldn't find anything
 249+ // on the web that states whether this is true.
 250+ $dbw->begin();
 251+ $dbw->insert( 'article_feedback', $rows, __METHOD__ );
 252+ $dbw->commit();
 253+
 254+ return $dbw->insertID();
300255 }
301256
302257 /**
Index: trunk/extensions/ArticleFeedback/api/ApiQueryArticleFeedback.php
@@ -187,14 +187,13 @@
188188 array(
189189 'aa_page_id' => $params['pageid'],
190190 'aa_rating_id' => $wgArticleFeedbackRatings,
191 - 'aa_user_id' => $wgUser->getId(),
192191 'aa_user_text' => $wgUser->getName(),
193192 'aa_user_anon_token' => $this->getAnonToken( $params ),
194193 ),
195194 __METHOD__,
196195 array(
197196 'LIMIT' => count( $wgArticleFeedbackRatings ),
198 - 'ORDER BY' => 'aa_revision DESC',
 197+ 'ORDER BY' => 'aa_id DESC',
199198 ),
200199 array(
201200 'article_feedback_ratings' => array( 'LEFT JOIN', array( 'aar_id=aa_rating_id' ) )

Sign-offs

UserFlagDate
😂inspected21:29, 28 September 2011

Follow-up revisions

RevisionCommit summaryAuthorDate
r982321.17wmf1: Experimental counter drift fix for AFT, much simpler approach than ...catrope17:28, 27 September 2011
r110959Revert most of r96722. The massive schema change was way overkill given that ...catrope20:21, 8 February 2012
r110962Followup r110959: reapply the good parts of r96722:...catrope20:52, 8 February 2012
r110967Followup r110959: reintroduce another good part of r96722 (sort of): add an i...catrope21:44, 8 February 2012

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r94404Hopefully prevent race conditions in ArticleFeedback by reading from the mast...catrope10:46, 13 August 2011
r94509Followup r94404: remove LOCK IN SHARE MODE because locking reads are evil, pe...catrope13:06, 15 August 2011

Comments

#Comment by Catrope (talk | contribs)   15:37, 10 September 2011

Looks like 1.17wmf1 would be mostly fixed with just these few lines: https://gist.github.com/1208442

Status & tagging log