r93720 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r93719‎ | r93720 | r93721 >
Date:10:13, 2 August 2011
Author:j
Status:resolved (Comments)
Tags:
Comment:
Extend upload api adding an option to upload files in chunks,
using UploadStash to track the unfinished upload.
Modified paths:
  • /trunk/phase3/includes/api/ApiUpload.php (modified) (history)
  • /trunk/phase3/includes/upload/UploadFromStash.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/upload/UploadFromStash.php
@@ -130,4 +130,36 @@
131131 return $rv;
132132 }
133133
134 -}
\ No newline at end of file
 134+ /**
 135+ * Append a chunk to the temporary file.
 136+ *
 137+ * @return void
 138+ */
 139+ public function appendChunk($chunk, $chunkSize, $offset) {
 140+ //to use $this->getFileSize() here, db needs to be updated
 141+ //in appendToUploadFile for that
 142+ $fileSize = $this->stash->getFile( $this->mFileKey )->getSize();
 143+ if ( $fileSize + $chunkSize > $this->getMaxUploadSize()) {
 144+ $status = Status::newFatal( 'file-too-large' );
 145+ } else {
 146+ //append chunk
 147+ if ( $fileSize == $offset ) {
 148+ $status = $this->appendToUploadFile( $chunk,
 149+ $this->mVirtualTempPath );
 150+ } else {
 151+ $status = Status::newFatal( 'invalid-chunk-offset' );
 152+ }
 153+ }
 154+ return $status;
 155+ }
 156+
 157+ /**
 158+ * Append the final chunk and ready file for parent::performUpload()
 159+ * @return void
 160+ */
 161+ public function finalizeFile() {
 162+ $this->appendFinish ( $this->mVirtualTempPath );
 163+ $this->cleanupTempFile();
 164+ $this->mTempPath = $this->getRealPath( $this->mVirtualTempPath );
 165+ }
 166+}
Index: trunk/phase3/includes/api/ApiUpload.php
@@ -58,6 +58,7 @@
5959 $request = $this->getMain()->getRequest();
6060 // Add the uploaded file to the params array
6161 $this->mParams['file'] = $request->getFileName( 'file' );
 62+ $this->mParams['chunk'] = $request->getFileName( 'chunk' );
6263
6364 // Copy the session key to the file key, for backward compatibility.
6465 if( !$this->mParams['filekey'] && $this->mParams['sessionkey'] ) {
@@ -85,7 +86,14 @@
8687 }
8788
8889 // Check if the uploaded file is sane
89 - $this->verifyUpload();
 90+ if ( $this->mParams['chunk'] ) {
 91+ $maxSize = $this->mUpload->getMaxUploadSize( );
 92+ if( $this->mParams['filesize'] > $maxSize ) {
 93+ $this->dieUsage( 'The file you submitted was too large', 'file-too-large' );
 94+ }
 95+ } else {
 96+ $this->verifyUpload();
 97+ }
9098
9199
92100 // Check if the user has the rights to modify or overwrite the requested title
@@ -113,6 +121,26 @@
114122 } catch ( MWException $e ) {
115123 $result['warnings']['stashfailed'] = $e->getMessage();
116124 }
 125+ } elseif ( $this->mParams['chunk'] ) {
 126+ $result['result'] = 'Continue';
 127+ $chunk = $request->getFileTempName( 'chunk' );
 128+ $chunkSize = $request->getFileSize( 'chunk' );
 129+ if ($this->mParams['offset'] == 0) {
 130+ $result['filekey'] = $this->performStash();
 131+ } else {
 132+ $status = $this->mUpload->appendChunk($chunk, $chunkSize,
 133+ $this->mParams['offset']);
 134+ if ( !$status->isGood() ) {
 135+ $this->dieUsage( $status->getWikiText(), 'stashfailed' );
 136+ } else {
 137+ $result['filekey'] = $this->mParams['filekey'];
 138+ if($this->mParams['offset'] + $chunkSize == $this->mParams['filesize']) {
 139+ $this->mUpload->finalizeFile();
 140+ $result['result'] = 'Done';
 141+ }
 142+ }
 143+ }
 144+ $result['offset'] = $this->mParams['offset'] + $chunkSize;
117145 } elseif ( $this->mParams['stash'] ) {
118146 // Some uploads can request they be stashed, so as not to publish them immediately.
119147 // In this case, a failure to stash ought to be fatal
@@ -188,9 +216,10 @@
189217 protected function selectUploadModule() {
190218 $request = $this->getMain()->getRequest();
191219
192 - // One and only one of the following parameters is needed
193 - $this->requireOnlyOneParameter( $this->mParams,
194 - 'filekey', 'file', 'url', 'statuskey' );
 220+ // chunk or one and only one of the following parameters is needed
 221+ if(!$this->mParams['chunk'])
 222+ $this->requireOnlyOneParameter( $this->mParams,
 223+ 'filekey', 'file', 'url', 'statuskey' );
195224
196225 if ( $this->mParams['statuskey'] ) {
197226 $this->checkAsyncDownloadEnabled();
@@ -234,6 +263,13 @@
235264
236265 $this->mUpload->initialize( $this->mParams['filekey'], $this->mParams['filename'] );
237266
 267+ } elseif ( isset( $this->mParams['chunk'] ) ) {
 268+ // Start new Chunk upload
 269+ $this->mUpload = new UploadFromFile();
 270+ $this->mUpload->initialize(
 271+ $this->mParams['filename'],
 272+ $request->getUpload( 'chunk' )
 273+ );
238274 } elseif ( isset( $this->mParams['file'] ) ) {
239275 $this->mUpload = new UploadFromFile();
240276 $this->mUpload->initialize(
@@ -488,6 +524,10 @@
489525 ),
490526 'stash' => false,
491527
 528+ 'filesize' => null,
 529+ 'offset' => null,
 530+ 'chunk' => null,
 531+
492532 'asyncdownload' => false,
493533 'leavemessage' => false,
494534 'statuskey' => null,
@@ -511,6 +551,10 @@
512552 'sessionkey' => 'Same as filekey, maintained for backward compatibility.',
513553 'stash' => 'If set, the server will not add the file to the repository and stash it temporarily.',
514554
 555+ 'chunk' => 'Chunk contents',
 556+ 'offset' => 'Offset of chunk in bytes',
 557+ 'filesize' => 'Filesize of entire upload',
 558+
515559 'asyncdownload' => 'Make fetching a URL asynchronous',
516560 'leavemessage' => 'If asyncdownload is used, leave a message on the user talk page if finished',
517561 'statuskey' => 'Fetch the upload status for this file key',

Sign-offs

UserFlagDate
Catropeinspected21:09, 10 August 2011

Follow-up revisions

RevisionCommit summaryAuthorDate
r93725use tabs. for consistant whitespacesj11:01, 2 August 2011
r93960dont use deprecated function WebRequest::getFileSizej08:49, 5 August 2011
r104659Use database to track uploaded chunks and concatenate at the end....j08:55, 30 November 2011
r104687Use database to track uploaded chunks and concatenate at the end....j14:56, 30 November 2011

Comments

#Comment by Reedy (talk | contribs)   10:36, 2 August 2011

You've got really mixed whitespace as tabs and spaces

#Comment by BotInc (talk | contribs)   10:59, 2 August 2011

ups, should I commit a whitespace fix or what would be the right way to do this?

#Comment by Reedy (talk | contribs)   10:59, 2 August 2011

Yeah, a whitespace fix commit is fine

#Comment by BotInc (talk | contribs)   11:02, 2 August 2011

whitespace commit in Special:Code/MediaWiki/93725

#Comment by P858snake (talk | contribs)   11:04, 2 August 2011

In future when you want to link revisions, in the commit summaries you can just include "rXXXXX" and code-review will do it automagically.

#Comment by Platonides (talk | contribs)   22:25, 4 August 2011

Non deprecated function execute calls deprecated function WebRequest::getFileSize in line 127

#Comment by BotInc (talk | contribs)   12:25, 14 August 2011

fixed in r93960

#Comment by MarkAHershberger (talk | contribs)   17:42, 16 October 2011

I do not recall reviewing this code. So re-marking to new.

#Comment by Tim Starling (talk | contribs)   18:29, 16 October 2011

I can't see any synchronisation here. NFS doesn't have locking or atomic append operations, so what's to stop chunks from becoming interleaved or overwriting each other?

Also, filesize() on a file on NFS could potentially hit the stat cache in the NFS client, giving a delay of up to 2 seconds in file size updates. Say if you upload 3 chunks, and they are handled by 2 servers:

  1. Server A
  2. Server B
  3. Server A

if this occurs in less than the stat cache expiry time, the third chunk will get a file size that doesn't include the second chunk.

It probably makes sense to use the database for synchronisation instead of relying on FileRepo.

#Comment by NeilK (talk | contribs)   17:49, 18 October 2011

I think modern NFS implementations, locking is effective (or you can use tricks with hard links to guarantee an exclusive write). Can we use those here, to prevent overwrites?

Also, the 'offset' parameter tells the chunk appender where to start. If, before the chunk is written, the file isn't exactly that offset in size, the request fails and the client has to try again.

#Comment by GICodeWarrior (talk | contribs)   18:04, 18 October 2011

On a well configured, homogeneous network with supported applications, NFS locking is effective. In many cases though it is not (for example with mixed client/server types/versions like FreeBSD, Linux, Solaris, Windows, etc.), and it is generally not enabled by default.  :-(

#Comment by NeilK (talk | contribs)   18:31, 18 October 2011

Replying to myself... I see that you already understood the offset thing and your point was that it wasn't a reliable way to determine if it was okay to append a chunk.

I've read a bit more about how NFS handles these situations. (http://nfs.sourceforge.net/#faq_a8). Things seem to have gotten a lot better in recent Linux NFS clients, and to some extent the problem of cached stat data and non-atomic appends might be mitigated by using locks, which suck less than they did. But this is a bit out of my depth; perhaps the simplest thing is to use MySQL to synchronize writers.

While we're thinking about this stuff, we should also use PHP clearstatcache().

We had another proposal where the server asked the browser to just supply whatever chunk came after the file's current size. That's in some ways a lot easier to coordinate.

#Comment by NeilK (talk | contribs)   18:55, 18 October 2011

That said, a database might also help with Swift (when that finally is deployed).

We have an implementation of a "chunked" upload mechanism there, but as far as I can tell we can't actually append to a Swift object in separate invocations. The client.py instead implements chunked *transfer* encoding, which is just a convention for streaming data in when we don't know its ultimate length. It may be that Russell was expecting the same thing to be happening from browser to server.

http://www.mediawiki.org/wiki/Extension:SwiftMedia#client.py_mods

#Comment by Platonides (talk | contribs)   19:19, 18 October 2011

What about requiring a hash of the file when using chunk upload?

That way we would be certain of uploading the complete files, regardless of possible race conditions that we may not be able to mitigate 100%.

#Comment by NeilK (talk | contribs)   22:01, 18 October 2011

- At the moment, few browsers could ever hash the file client side, period.

- Reading the entire file may be prohibitively expensive. We do read entire files in for thumbnailing, on some browsers, but there are max sizes.

- That's not even the problem we're trying to solve here. We're not trying to detect errors after the fact, we're trying to make the upload process more reliable. They aren't the same thing.

#Comment by Platonides (talk | contribs)   19:59, 23 October 2011

I was thinking in upload bots, not browsers.

I don't think reliable is the word you want here, discarding wrong data instead of silently accepting does improve reliability (albeit it's costly):

reliable: Such that either a sent packet will reach its destination, even if it requires retransmission, or the sender will be told that it didn't Not is important.

#Comment by Platonides (talk | contribs)   19:26, 18 October 2011

If appendToUploadFile() was passed the offset, instead of using the filesize it would still process it correctly if sent in wrong order.

#Comment by BotInc (talk | contribs)   14:07, 17 October 2011

does this need a new database field or is there one that could be used? Is FileRepo not currently caching the filesize in the db already?

#Comment by Tim Starling (talk | contribs)   22:29, 18 October 2011

Swift cannot support append operations efficiently, and other content-addressed storage systems are probably the same. Even with filesystems which ostensibly support appending, such as NFS, the implementation would be complex. So our current design for the FileBackend refactor project does not include support for appending, instead it has a concatenate operation.

Each chunk would be stored to the backend as a separate file. Then the concatenate operation would combine the pre-stored chunks in a specified order, storing the result to a single large file.

#Comment by BotInc (talk | contribs)   09:23, 19 November 2011

mdale attached a patch to https://bugzilla.wikimedia.org/show_bug.cgi?id=29250 that uses the db to track the chunks and concatenates them at the end. Once this addresses all problems and is commited it should resolve this fixme. Tim: can you have a look and let us know if there are any issues or concerns with the patch, thanks.

Status & tagging log