r92200 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r92199‎ | r92200 | r92201 >
Date:21:33, 14 July 2011
Author:raindrift
Status:resolved (Comments)
Tags:
Comment:
fixed a condition where re-uploading a file that's already stashed causes breakage. This mirrors the previous session-based behavior as closely as possible.
Modified paths:
  • /trunk/phase3/includes/upload/UploadStash.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/upload/UploadStash.php
@@ -238,7 +238,31 @@
239239 // insert the file metadata into the db.
240240 wfDebug( __METHOD__ . " inserting $stashPath under $key\n" );
241241 $dbw = $this->repo->getMasterDb();
 242+
 243+ // select happens on the master so this can all be in a transaction, which
 244+ // avoids a race condition that's likely with multiple people uploading from the same
 245+ // set of files
 246+ $dbw->begin();
 247+ // first, check to see if it's already there.
 248+ $row = $dbw->selectRow(
 249+ 'uploadstash',
 250+ 'us_user, us_timestamp',
 251+ array('us_key' => $key),
 252+ __METHOD__
 253+ );
242254
 255+ // The current user can't have this key if:
 256+ // - the key is owned by someone else and
 257+ // - the age of the key is less than REPO_AGE
 258+ if( is_object( $row ) ) {
 259+ if( $row->us_user != $this->userId &&
 260+ $row->wfTimestamp( TS_UNIX, $row->us_timestamp ) > time() - UploadStash::REPO_AGE * 3600
 261+ ) {
 262+ $dbw->rollback();
 263+ throw new UploadStashWrongOwnerException( "Attempting to upload a duplicate of a file that someone else has stashed" );
 264+ }
 265+ }
 266+
243267 $this->fileMetadata[$key] = array(
244268 'us_user' => $this->userId,
245269 'us_key' => $key,
@@ -255,13 +279,15 @@
256280 'us_timestamp' => $dbw->timestamp(),
257281 'us_status' => 'finished'
258282 );
259 -
260 -
261 - $dbw->insert(
 283+
 284+ // if a row exists but previous checks on it passed, let the current user take over this key.
 285+ $dbw->replace(
262286 'uploadstash',
 287+ 'us_key',
263288 $this->fileMetadata[$key],
264289 __METHOD__
265290 );
 291+ $dbw->commit();
266292
267293 // store the insertid in the class variable so immediate retrieval (possibly laggy) isn't necesary.
268294 $this->fileMetadata[$key]['us_id'] = $dbw->insertId();

Follow-up revisions

RevisionCommit summaryAuthorDate
r94594Removed the ability to pass a key into stashFile(), which simplifies the stas...raindrift23:58, 15 August 2011

Comments

#Comment by NeilK (talk | contribs)   03:17, 23 July 2011

I don't understand, under what conditions can this problem occur?

There is not supposed to be any check for uniqueness of file contents in the UploadStash. As you mentioned it is very likely to get re-uploads, even from multiple people, and this is not necessarily a mistake at this stage.

Files just need to have unique names and generally the unique name is set by the API client anyway, and if not you have a good function to create a unique name.

#Comment by Raindrift (talk | contribs)   16:37, 25 July 2011

This was a problem when we were using the content hash as the filekey. The key is a unique string as of r92269, so it's only relevant if the user-supplied key is non-unique. That's a pretty rare edge case, but I figured since it was possible I'd leave the code in.

#Comment by Catrope (talk | contribs)   11:20, 15 August 2011

With r92269 in mind, this is fine. However, the error message is misleading.

#Comment by Catrope (talk | contribs)   11:25, 15 August 2011

The database logic is still vulnerable to race conditions: if a key is held by A but has expired, B and C can try to take it over at the same time. Both will be allowed to, both will execute a REPLACE query, and one will win. You can prevent this by using FOR UPDATE in your SELECT. My personal preference for ensuring such things atomically is an INSERT IGNORE followed by a conditional UPDATE which includes the user and age criteria in the WHERE, but it seems to me that SELECT FOR UPDATE will work fine here, and locking is not a concern here because you're only locking a single row, if that: in the vast majority of cases you'll select and lock zero rows.

#Comment by Catrope (talk | contribs)   11:28, 15 August 2011

Similarly, removeFile() is vulnerable to a race condition if the file is taken over in between checking ownership and deleting it. It's much easier to just to a conditional DELETE and put the ownership criterion in the WHERE clause.

#Comment by NeilK (talk | contribs)   13:42, 15 August 2011

Maybe I'm missing something basic here, but it seems to me you are discussing race conditions that are unlikely to occur in the remaining lifetime of our universe. The keys are random. If our PRNG is broken we have worse problems than a file being overwritten.

#Comment by Catrope (talk | contribs)   13:48, 15 August 2011

Ian suggests it's possible to supply non-random keys:

This was a problem when we were using the content hash as the filekey. The key is a unique string as of r92269, so it's only relevant if the user-supplied key is non-unique. That's a pretty rare edge case, but I figured since it was possible I'd leave the code in.

I agree that race conditions in the random keys can be considered impossible.

#Comment by NeilK (talk | contribs)   13:53, 15 August 2011

I meant you in the plural sense of both of you. ;)

But Ian & I discussed this in person and he really wanted to be sure, (perhaps it is worth it, if other things about how we choose the key change?) so I didn't demand that the code be removed.

#Comment by Catrope (talk | contribs)   13:57, 15 August 2011

I'm personally fine with somewhat ungraceful handling of a very rare edge case based on duplicate random keys for which the probability they will ever happen in the remaining lifetime of the universe is very low, as long as the consequences aren't too confusing. Just throwing an error is fine, overwriting things is not. But if you're gonna write code to handle this case and resolve it in a nice way, you should at least write correct code :)

#Comment by Raindrift (talk | contribs)   18:29, 15 August 2011

My code generates the key for all the use cases in UW. The key is guaranteed unique, and I'm not concerned about a race condition there. I wrote this code when that wasn't true.

However, it's possible to pass in a key from elsewhere. That is, I don't have any guarantee that this key will be unique at all, since I'm not always generating it. For example, if the externally-generated key is based on the local filename, collisions could arise. Since the key is always owned by the uploading user, it seemed almost certain that a key collision would mean a newer version of the file, which is why the default is to overwrite.

I'd be down to remove all this code, though, as long as I also remove the $key param from UploadStash::stashFile(). I'm concerned that doing so might break something external somewhere, though, which is why I left it alone.

#Comment by NeilK (talk | contribs)   18:57, 15 August 2011

We're close to being able to remove the $key parameter entirely. The only way (as far as I can tell) that $key is ever necessarily non-null is the case of includes/job/UploadFromUrlJob.php. In fact UploadFromUrlJob.php is already outmoded, it's making assumptions that the session is where the stashed files are. I believe it can be simplified so that it doesn't pass a key at all, and instead returns the key generated.

#Comment by Raindrift (talk | contribs)   23:59, 15 August 2011

Sounds like a plan! Implemented in r94594

Status & tagging log