r110435 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r110434‎ | r110435 | r110436 >
Date:21:52, 31 January 2012
Author:aaron
Status:ok (Comments)
Tags:todo 
Comment:
Expanded 'shardViaHashLevels' config var in FileBackendStore to be able to recognize FileRepo-style deleted zone hash paths (which are different than for the other zones).
Modified paths:
  • /trunk/phase3/includes/filerepo/backend/FileBackend.php (modified) (history)
  • /trunk/phase3/includes/filerepo/backend/SwiftFileBackend.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/filerepo/backend/SwiftFileBackend.php
@@ -39,7 +39,11 @@
4040 * swiftKey : Swift authentication key for the above user
4141 * swiftAuthTTL : Swift authentication TTL (seconds)
4242 * swiftAnonUser : Swift user used for end-user requests (account:username)
43 - * shardViaHashLevels : Map of container names to the number of hash levels
 43+ * shardViaHashLevels : Map of container names to sharding config with:
 44+ * 'base' : base of hash characters, 16 or 36
 45+ * 'levels' : the number of hash levels (and digits)
 46+ * 'repeat' : hash subdirectories are prefixed with all the
 47+ * parent hash directory names (e.g. "a/ab/abc")
4448 */
4549 public function __construct( array $config ) {
4650 parent::__construct( $config );
Index: trunk/phase3/includes/filerepo/backend/FileBackend.php
@@ -696,8 +696,8 @@
697697 protected $expCache = array(); // (storage path => key => value)
698698 protected $maxExpCacheSize = 10; // integer; max paths with entries
699699
700 - /** @var Array */
701 - protected $shardViaHashLevels = array(); // (container name => integer)
 700+ /** @var Array Map of container names to sharding settings */
 701+ protected $shardViaHashLevels = array(); // (container name => config array)
702702
703703 protected $maxFileSize = 1000000000; // integer bytes (1GB)
704704
@@ -1492,40 +1492,53 @@
14931493 * @return string|null Returns null if shard could not be determined
14941494 */
14951495 final protected function getContainerShard( $container, $relPath ) {
1496 - $hashLevels = $this->getContainerHashLevels( $container );
1497 - if ( $hashLevels === 1 ) { // 16 shards per container
1498 - $hashDirRegex = '(?P<shard>[0-9a-f])';
1499 - } elseif ( $hashLevels === 2 ) { // 256 shards per container
1500 - $hashDirRegex = '[0-9a-f]/(?P<shard>[0-9a-f]{2})';
1501 - } else {
1502 - return ''; // no sharding
 1496+ list( $levels, $base, $repeat ) = $this->getContainerHashLevels( $container );
 1497+ if ( $levels == 1 || $levels == 2 ) {
 1498+ // Hash characters are either base 16 or 36
 1499+ $char = ( $base == 36 ) ? '[0-9a-z]' : '[0-9a-f]';
 1500+ // Get a regex that represents the shard portion of paths.
 1501+ // The concatenation of the captures gives us the shard.
 1502+ if ( $levels === 1 ) { // 16 or 36 shards per container
 1503+ $hashDirRegex = '(' . $char . ')';
 1504+ } else { // 256 or 1296 shards per container
 1505+ if ( $repeat ) { // verbose hash dir format (e.g. "a/ab/abc")
 1506+ $hashDirRegex = $char . '/(' . $char . '{2})';
 1507+ } else { // short hash dir format (e.g. "a/b/c")
 1508+ $hashDirRegex = '(' . $char . ')/(' . $char . ')';
 1509+ }
 1510+ }
 1511+ // Allow certain directories to be above the hash dirs so as
 1512+ // to work with FileRepo (e.g. "archive/a/ab" or "temp/a/ab").
 1513+ // They must be 2+ chars to avoid any hash directory ambiguity.
 1514+ $m = array();
 1515+ if ( preg_match( "!^(?:[^/]{2,}/)*$hashDirRegex(?:/|$)!", $relPath, $m ) ) {
 1516+ return '.' . implode( '', array_slice( $m, 1 ) );
 1517+ }
 1518+ return null; // failed to match
15031519 }
1504 - // Allow certain directories to be above the hash dirs so as
1505 - // to work with FileRepo (e.g. "archive/a/ab" or "temp/a/ab").
1506 - // They must be 2+ chars to avoid any hash directory ambiguity.
1507 - $m = array();
1508 - if ( preg_match( "!^(?:[^/]{2,}/)*$hashDirRegex(?:/|$)!", $relPath, $m ) ) {
1509 - return '.' . $m['shard'];
1510 - }
1511 - return null; // failed to match
 1520+ return ''; // no sharding
15121521 }
15131522
15141523 /**
1515 - * Get the number of hash levels for a container.
 1524+ * Get the sharding config for a container.
15161525 * If greater than 0, then all file storage paths within
15171526 * the container are required to be hashed accordingly.
15181527 *
15191528 * @param $container string
1520 - * @return integer
 1529+ * @return Array (integer levels, integer base, repeat flag) or (0, 0, false)
15211530 */
15221531 final protected function getContainerHashLevels( $container ) {
15231532 if ( isset( $this->shardViaHashLevels[$container] ) ) {
1524 - $hashLevels = (int)$this->shardViaHashLevels[$container];
1525 - if ( $hashLevels >= 0 && $hashLevels <= 2 ) {
1526 - return $hashLevels;
 1533+ $config = $this->shardViaHashLevels[$container];
 1534+ $hashLevels = (int)$config['levels'];
 1535+ if ( $hashLevels == 0 || $hashLevels == 2 ) {
 1536+ $hashBase = (int)$config['base'];
 1537+ if ( $hashBase == 16 || $hashBase == 36 ) {
 1538+ return array( $hashLevels, $hashBase, $config['repeat'] );
 1539+ }
15271540 }
15281541 }
1529 - return 0; // no sharding
 1542+ return array( 0, 0, false ); // no sharding
15301543 }
15311544
15321545 /**
@@ -1536,11 +1549,11 @@
15371550 */
15381551 final protected function getContainerSuffixes( $container ) {
15391552 $shards = array();
1540 - $digits = $this->getContainerHashLevels( $container );
 1553+ list( $digits, $base ) = $this->getContainerHashLevels( $container );
15411554 if ( $digits > 0 ) {
1542 - $numShards = 1 << ( $digits * 4 );
 1555+ $numShards = pow( $base, $digits );
15431556 for ( $index = 0; $index < $numShards; $index++ ) {
1544 - $shards[] = '.' . str_pad( dechex( $index ), $digits, '0', STR_PAD_LEFT );
 1557+ $shards[] = '.' . wfBaseConvert( $index, 10, $base, $digits );
15451558 }
15461559 }
15471560 return $shards;

Follow-up revisions

RevisionCommit summaryAuthorDate
r110462Quick fix to getContainerHashLevels() comparison from r110435.aaron04:58, 1 February 2012

Comments

#Comment by Aaron Schulz (talk | contribs)   21:55, 31 January 2012

I discovered this trying to figure out why file deletions were timing out on my test wiki. It wasn't recognizing the hash paths in prepare() calls and thus starting creating 256 container shards to prepare all shards.

#Comment by Tim Starling (talk | contribs)   02:17, 1 February 2012

You should probably throw an exception if the base is something other than 16 or 36, or if the number of levels is something other than 0 or 2, rather than ignoring the setting.

-			$numShards = 1 << ( $digits * 4 );
+			$numShards = pow( $base, $digits );

I was worried about whether pow() is exact for integer arguments, so I checked the source. It is exact.

#Comment by Aaron Schulz (talk | contribs)   04:57, 1 February 2012
if ( $hashLevels == 0 || $hashLevels == 2 ) {

This is wrong. I caught myself with this mistake in another spot but apparently not here. It just works due to the paranoid check in getContainerHashLevels().

#Comment by Aaron Schulz (talk | contribs)   05:01, 1 February 2012

I meant "paranoid check in getContainerShard()"...err

Status & tagging log