r109469 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r109468‎ | r109469 | r109470 >
Date:02:07, 19 January 2012
Author:aaron
Status:resolved (Comments)
Tags:filebackend 
Comment:
In FileBackend/FileOp:
* Added a sane default max file size to FileBackend. Operation batches need to check this before trying anything.
* Temporarily adjust the PHP execution time limit in attemptBatch() to reduce the chance of dying in the middle of it. Also added a maximum batch size limit.
* Added some code comments.
Modified paths:
  • /trunk/phase3/includes/AutoLoader.php (modified) (history)
  • /trunk/phase3/includes/filerepo/backend/FileBackend.php (modified) (history)
  • /trunk/phase3/includes/filerepo/backend/FileOp.php (modified) (history)
  • /trunk/phase3/languages/messages/MessagesEn.php (modified) (history)
  • /trunk/phase3/maintenance/language/messages.inc (modified) (history)

Diff [purge]

Index: trunk/phase3/maintenance/language/messages.inc
@@ -1367,7 +1367,8 @@
13681368 'backend-fail-synced',
13691369 'backend-fail-connect',
13701370 'backend-fail-internal',
1371 - 'backend-fail-contenttype'
 1371+ 'backend-fail-contenttype',
 1372+ 'backend-fail-batchsize'
13721373 ),
13731374
13741375 'lockmanager-errors' => array(
Index: trunk/phase3/includes/filerepo/backend/FileOp.php
@@ -34,6 +34,10 @@
3535 const STATE_CHECKED = 2;
3636 const STATE_ATTEMPTED = 3;
3737
 38+ /* Timeout related parameters */
 39+ const MAX_BATCH_SIZE = 1000;
 40+ const TIME_LIMIT_SEC = 300; // 5 minutes
 41+
3842 /**
3943 * Build a new file operation transaction
4044 *
@@ -52,6 +56,10 @@
5357
5458 /**
5559 * Allow stale data for file reads and existence checks.
 60+ *
 61+ * Note that we don't want to mix stale and non-stale reads
 62+ * because stat calls are cached: if we read X without 'latest'
 63+ * and then read it with 'latest', the data may still be stale.
5664 *
5765 * @return void
5866 */
@@ -80,6 +88,12 @@
8189 $allowStale = !empty( $opts['allowStale'] );
8290 $ignoreErrors = !empty( $opts['force'] );
8391
 92+ $n = count( $performOps );
 93+ if ( $n > self::MAX_BATCH_SIZE ) {
 94+ $status->fatal( 'backend-fail-batchsize', $n, self::MAX_BATCH_SIZE );
 95+ return $status;
 96+ }
 97+
8498 $predicates = FileOp::newPredicates(); // account for previous op in prechecks
8599 // Do pre-checks for each operation; abort on failure...
86100 foreach ( $performOps as $index => $fileOp ) {
@@ -97,6 +111,11 @@
98112 }
99113 }
100114
 115+ // Restart PHP's execution timer and set the timeout to safe amount.
 116+ // This handles cases where the operations take a long time or where we are
 117+ // already running low on time left. The old timeout is restored afterwards.
 118+ $scopedTimeLimit = new FileOpScopedPHPTimeout( self::TIME_LIMIT_SEC );
 119+
101120 // Attempt each operation...
102121 foreach ( $performOps as $index => $fileOp ) {
103122 if ( $fileOp->failed() ) {
@@ -326,6 +345,39 @@
327346 }
328347
329348 /**
 349+ * FileOp helper class to expand PHP execution time for a function.
 350+ * On construction, set_time_limit() is called and set to $seconds.
 351+ * When the object goes out of scope, the timer is restarted, with
 352+ * the original time limit minus the time the object existed.
 353+ */
 354+class FileOpScopedPHPTimeout {
 355+ protected $startTime; // integer seconds
 356+ protected $oldTimeout; // integer seconds
 357+
 358+ /**
 359+ * @param $seconds integer
 360+ */
 361+ public function __construct( $seconds ) {
 362+ if ( ini_get( 'max_execution_time' ) > 0 ) { // CLI uses 0
 363+ $this->oldTimeout = ini_set( 'max_execution_time', $seconds );
 364+ }
 365+ $this->startTime = time();
 366+ }
 367+
 368+ /*
 369+ * Restore the original timeout.
 370+ * This does not account for the timer value on __construct().
 371+ */
 372+ public function __destruct() {
 373+ if ( $this->oldTimeout ) {
 374+ $elapsed = time() - $this->startTime;
 375+ // Note: a limit of 0 is treated as "forever"
 376+ set_time_limit( max( 1, $this->oldTimeout - $elapsed ) );
 377+ }
 378+ }
 379+}
 380+
 381+/**
330382 * Store a file into the backend from a file on the file system.
331383 * Parameters similar to FileBackend::storeInternal(), which include:
332384 * src : source path on file system
@@ -345,6 +397,11 @@
346398 $status->fatal( 'backend-fail-notexists', $this->params['src'] );
347399 return $status;
348400 }
 401+ // Check if the source file is too big
 402+ if ( filesize( $this->params['src'] ) > $this->backend->maxFileSizeInternal() ) {
 403+ $status->fatal( 'backend-fail-store', $this->params['dst'] );
 404+ return $status;
 405+ }
349406 // Check if destination file exists
350407 $status->merge( $this->precheckDestExistence( $predicates ) );
351408 if ( !$status->isOK() ) {
@@ -395,6 +452,11 @@
396453
397454 protected function doPrecheck( array &$predicates ) {
398455 $status = Status::newGood();
 456+ // Check if the source data is too big
 457+ if ( strlen( $this->params['content'] ) > $this->backend->maxFileSizeInternal() ) {
 458+ $status->fatal( 'backend-fail-create', $this->params['dst'] );
 459+ return $status;
 460+ }
399461 // Check if destination file exists
400462 $status->merge( $this->precheckDestExistence( $predicates ) );
401463 if ( !$status->isOK() ) {
Index: trunk/phase3/includes/filerepo/backend/FileBackend.php
@@ -592,7 +592,20 @@
593593 /** @var Array */
594594 protected $shardViaHashLevels = array(); // (container name => integer)
595595
 596+ protected $maxFileSize = 1000000000; // integer bytes (1GB)
 597+
596598 /**
 599+ * Get the maximum allowable file size given backend
 600+ * medium restrictions and basic performance constraints.
 601+ * Do not call this function from places outside FileBackend and FileOp.
 602+ *
 603+ * @return integer Bytes
 604+ */
 605+ final public function maxFileSizeInternal() {
 606+ return $this->maxFileSize;
 607+ }
 608+
 609+ /**
597610 * Create a file in the backend with the given contents.
598611 * Do not call this function from places outside FileBackend and FileOp.
599612 *
@@ -605,8 +618,12 @@
606619 * @return Status
607620 */
608621 final public function createInternal( array $params ) {
609 - $status = $this->doCreateInternal( $params );
610 - $this->clearCache( array( $params['dst'] ) );
 622+ if ( strlen( $params['content'] ) > $this->maxFileSizeInternal() ) {
 623+ $status = Status::newFatal( 'backend-fail-create', $params['dst'] );
 624+ } else {
 625+ $status = $this->doCreateInternal( $params );
 626+ $this->clearCache( array( $params['dst'] ) );
 627+ }
611628 return $status;
612629 }
613630
@@ -628,8 +645,12 @@
629646 * @return Status
630647 */
631648 final public function storeInternal( array $params ) {
632 - $status = $this->doStoreInternal( $params );
633 - $this->clearCache( array( $params['dst'] ) );
 649+ if ( filesize( $params['src'] ) > $this->maxFileSizeInternal() ) {
 650+ $status = Status::newFatal( 'backend-fail-store', $params['dst'] );
 651+ } else {
 652+ $status = $this->doStoreInternal( $params );
 653+ $this->clearCache( array( $params['dst'] ) );
 654+ }
634655 return $status;
635656 }
636657
Index: trunk/phase3/includes/AutoLoader.php
@@ -511,6 +511,7 @@
512512 'MySqlLockManager'=> 'includes/filerepo/backend/lockmanager/DBLockManager.php',
513513 'NullLockManager' => 'includes/filerepo/backend/lockmanager/LockManager.php',
514514 'FileOp' => 'includes/filerepo/backend/FileOp.php',
 515+ 'FileOpScopedPHPTimeout' => 'includes/filerepo/backend/FileOp.php',
515516 'StoreFileOp' => 'includes/filerepo/backend/FileOp.php',
516517 'CopyFileOp' => 'includes/filerepo/backend/FileOp.php',
517518 'MoveFileOp' => 'includes/filerepo/backend/FileOp.php',
Index: trunk/phase3/languages/messages/MessagesEn.php
@@ -2258,6 +2258,7 @@
22592259 'backend-fail-connect' => 'Could not connect to file backend "$1".',
22602260 'backend-fail-internal' => 'An unknown error occurred in file backend "$1".',
22612261 'backend-fail-contenttype' => 'Could not determine the content type of file to store at "$1".',
 2262+'backend-fail-batchsize' => 'Backend given a batch of $1 file {{PLURAL:$1|operation|operations}}; the limit is $2.',
22622263
22632264 # Lock manager
22642265 'lockmanager-notlocked' => 'Could not unlock "$1"; it is not locked.',

Follow-up revisions

RevisionCommit summaryAuthorDate
r109583* Follow-up r109009: Check that paths are usable in FileOp::doPrecheck(). Als...aaron23:18, 19 January 2012
r110812r109469: protect FileOpScopedPHPTimeout from abuse by infinite loops. Also co...aaron00:35, 7 February 2012
r113209r109469: ported r113208 by bumping max file size from 1GB to 4GiB.aaron02:00, 7 March 2012

Comments

#Comment by Siebrand (talk | contribs)   07:48, 19 January 2012

$2 also needs plural

#Comment by Aaron Schulz (talk | contribs)   08:09, 19 January 2012

Why (in EN)? It it's either "the limit is 1" or "the limit is 2+"?

#Comment by Tim Starling (talk | contribs)   05:40, 1 February 2012

This seems like a very large can of worms that you're opening here.

You're calling set_time_limit() for every file operation. Suppose some caller is in an infinite loop calling some file operation, but the file operation itself is quite fast -- say 1% of each loop iteration. So the chances that time() will have a different value in __construct() and __destruct() would be only 1%, so 99 times out of 100, __destruct() will just restore the timer with its initial value. So the timer interval will slowly decay over a period of 100 times the configured interval, before it finally gets to 1 second. As long as the loop continues to take less than 1s per iteration, it will just continue forever.

Do we really need this?

#Comment by Tim Starling (talk | contribs)   05:01, 10 February 2012

The 1GB hard-coded maximum seems a bit low, considering that Erik Moeller just uploaded this: commons:File:Monthly Metrics Meeting February 2, 2012.theora.ogv. Is there some sort of significance to that number?

Looks fine (resolved) otherwise.

#Comment by Aaron Schulz (talk | contribs)   20:36, 12 February 2012

It could be set to 5GB at the most. I'd be more comfortable with larger files if I felt better about file op timeouts...

#Comment by Aaron Schulz (talk | contribs)   20:43, 12 February 2012

Well, 5GB is for swift. Typical file systems can go up much higher of course, though you have the problem of operations taking longer.

#Comment by Tim Starling (talk | contribs)   23:51, 12 February 2012

If it's about performance then the limit should be configurable.

#Comment by Aaron Schulz (talk | contribs)   00:04, 13 February 2012

Well if file ops timeout halfway through it's also about correctness. Any subclasses can make it configurable, but limited at some technical upper limit if they want.

#Comment by Tim Starling (talk | contribs)   02:56, 7 March 2012

Sure, but I mean that if you're trying to limit it to the largest size that can be stored reliably within a given time, then that size will change depending on the system performance. So to be compatible with present and future systems of varying performance, the limit should be configurable.

#Comment by Aaron Schulz (talk | contribs)   19:16, 7 March 2012

We really need a way to do ops in parallel...which is even a bit of a problem in swift.php right now.

Status & tagging log