r111964 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r111963‎ | r111964 | r111965 >
Date:21:22, 20 February 2012
Author:dantman
Status:reverted (Comments)
Tags:
Comment:
Commit a new cryptographic random generator class for use in MediaWiki.
Waiting for it to be reviewed before actually making use of it inside code and adding a RELEASE-NOTES entry.
Modified paths:
  • /trunk/phase3/includes/AutoLoader.php (modified) (history)
  • /trunk/phase3/includes/CryptRand.php (added) (history)

Diff [purge]

Index: trunk/phase3/includes/CryptRand.php
@@ -0,0 +1,252 @@
 2+<?php
 3+/**
 4+ * A static cryptographic random generator class used for generating secret keys
 5+ *
 6+ * This is based in part on Drupal code as well as what we used in our own code
 7+ * prior to introduction of this class.
 8+ *
 9+ * @file
 10+ */
 11+
 12+final class MWCryptRand {
 13+
 14+ /**
 15+ * Initialize an initial random state based off of whatever we can find
 16+ */
 17+ private static function initialRandomState() {
 18+ // $_SERVER contains a variety of unstable user and system specific information
 19+ // It'll vary a little with each page, and vary even more with separate users
 20+ // It'll also vary slightly across different machines
 21+ $state = serialize( $_SERVER );
 22+
 23+ // To try and vary the system information of the state a bit more
 24+ // by including the system's hostname into the state
 25+ $state .= wfHostname();
 26+
 27+ // Try and make this a little more unstable by including the varying process
 28+ // id of the php process we are running inside of if we are able to access it
 29+ if ( function_exists( 'getmypid' ) ) {
 30+ $state .= getmypid();
 31+ }
 32+
 33+ // If available try to increase the instability of the data by throwing in
 34+ // the precise amount of memory that we happen to be using at the moment.
 35+ if ( function_exists( 'memory_get_usage' ) ) {
 36+ $state .= memory_get_usage( true );
 37+ }
 38+
 39+ // It's mostly worthless but throw the wiki's id into the data for a little more variance
 40+ $state .= wfWikiID();
 41+
 42+ // If we have a secret key or proxy key set then throw it into the state as well
 43+ global $wgSecretKey, $wgProxyKey;
 44+ if ( $wgSecretKey ) {
 45+ $state .= $wgSecretKey;
 46+ } elseif ( $wgProxyKey ) {
 47+ $state .= $wgProxyKey;
 48+ }
 49+
 50+ return $state;
 51+ }
 52+
 53+ /**
 54+ * Return a rolling random state initially build using data from unstable sources
 55+ * @return A new weak random state
 56+ */
 57+ public static function randomState() {
 58+ static $state = null;
 59+ if ( is_null( $state ) ) {
 60+ // Initialize the state with whatever unstable data we can find
 61+ // It's important that this data is hashed right afterwards to prevent
 62+ // it from being leaked into the output stream
 63+ $state = self::initialRandomState();
 64+ }
 65+ // Generate a new random state based on the initial random state or previous
 66+ // random state by combining it with both the current time and a random value
 67+ // Simple append/prepend based methods of combining data and a salt have
 68+ // weaknesses in them, take advantage of the availability of hmac to abuse
 69+ // it's method of combining data and a key into a hash which is free of
 70+ // the typical weakness of simple concatenation
 71+ // Note that in hmac large keys are reduced in size and the key is then
 72+ // xor-ed to create two separate keys. For this reason we use the smaller
 73+ // time+rand as the key and the larger state as the data.
 74+ // We also don't bother passing numbers to mt_rand since you can't make
 75+ // it generate with any more entropy than it's default max value.
 76+ $state = self::hmac( $state, microtime() . mt_rand() );
 77+ return $state;
 78+ }
 79+
 80+ /**
 81+ * Decide on the best acceptable hash algorithm we have available for hash()
 82+ * @return String A hash algorithm
 83+ */
 84+ private static function hashAlgo() {
 85+ static $algo = null;
 86+ if ( !is_null( $algo ) ) {
 87+ return $algo;
 88+ }
 89+
 90+ $algos = hash_algos();
 91+ $preference = array( 'whirlpool', 'sha256', 'sha1', 'md5' );
 92+
 93+ foreach ( $preference as $algorithm ) {
 94+ if ( in_array( $algorithm, $algos ) ) {
 95+ $algo = $algorithm; # assign to static
 96+ return $algo;
 97+ }
 98+ }
 99+
 100+ // We only reach here if no acceptable hash is found in the list, this should
 101+ // be a technical impossibility since most of php's hash list is fixed and
 102+ // some of the ones we list are available as their own native functions
 103+ // But since we already require at least 5.2 and hash() was default in
 104+ // 5.1.2 we don't bother falling back to methods like sha1 and md5.
 105+ throw new MWException( "Could not find an acceptable hashing function in hash_algos()" );
 106+ }
 107+
 108+ /**
 109+ * Generate an acceptably unstable one-way-hash of some text
 110+ * making use of the best hash algorithm that we have available.
 111+ *
 112+ * @return String A raw hash of the data
 113+ */
 114+ private static function hash( $data ) {
 115+ return hash( self::hashAlgo(), $data, true );
 116+ }
 117+
 118+ /**
 119+ * Generate an acceptably unstable one-way-hmac of some text
 120+ * making use of the best hash algorithm that we have available.
 121+ *
 122+ * @return String A raw hash of the data
 123+ */
 124+ private static function hmac( $data, $key ) {
 125+ return hash_hmac( self::hashAlgo(), $data, $key, true );
 126+ }
 127+
 128+
 129+
 130+ private static $strong = null;
 131+
 132+ /**
 133+ * Return a boolean indicating whether or not the source used for cryptographic
 134+ * random bytes generation in the previously run generate* call
 135+ * was cryptographically strong.
 136+ *
 137+ * @return bool Returns true if the source was strong, false if not.
 138+ */
 139+ public static function wasStrong() {
 140+ if ( is_null( self::$strong ) ) {
 141+ throw new MWException( __METHOD__ . ' called before generation of random data' );
 142+ }
 143+ return self::$strong;
 144+ }
 145+
 146+ /**
 147+ * Generate a run of (ideally) cryptographically random data and return
 148+ * it in raw binary form.
 149+ * You can use MWCryptRand::wasStrong() if you wish to know if the source used
 150+ * was cryptographically strong.
 151+ *
 152+ * @param $bytes int the number of bytes of random data to generate
 153+ * @return String Raw binary random data
 154+ */
 155+ public static function generate( $bytes ) {
 156+ $bytes = floor( $bytes );
 157+ static $buffer = '';
 158+ if ( is_null( self::$strong ) ) {
 159+ // Set strength to false initially until we know what source data is coming from
 160+ self::$strong = true;
 161+ }
 162+
 163+ if ( strlen( $buffer ) < $bytes ) {
 164+ // /dev/urandom is generally considered the best possible commonly
 165+ // available random source, and is available on most *nix systems.
 166+ wfSuppressWarnings();
 167+ $urandom = fopen( "/dev/urandom", "rb" );
 168+ wfRestoreWarnings();
 169+
 170+ // Attempt to read all our random data from urandom
 171+ // php's fread always does buffered reads based on the stream's chunk_size
 172+ // so in reality it will usually read more than the amount of data we're
 173+ // asked for and it doesn't cost anything extra to store that.
 174+ // We don't have access to the stream's chunk_size, fread maxes out at 8k
 175+ // so we'll go along with Drupal's decision to read at least 4k
 176+ if ( $urandom ) {
 177+ $buffer .= fread( $urandom, max( 1024 * 4, $bytes ) );
 178+ fclose( $urandom );
 179+ if ( strlen( $buffer ) >= $bytes ) {
 180+ // urandom is always strong, set to true if all our data was generated using it
 181+ self::$strong = true;
 182+ }
 183+ }
 184+ }
 185+
 186+ if ( strlen( $buffer ) < $bytes ) {
 187+ // If available and we failed to read enough data out of urandom make use
 188+ // of openssl's random_pesudo_bytes method to attempt to generate randomness.
 189+ // However don't do this on Windows with PHP < 5.3.4 due to a bug:
 190+ // http://stackoverflow.com/questions/1940168/openssl-random-pseudo-bytes-is-slow-php
 191+ if ( ( $bytes - strlen( $buffer ) > 0 )
 192+ && function_exists( 'openssl_random_pseudo_bytes' )
 193+ && ( !wfIsWindows() || version_compare( PHP_VERSION, '5.3.4', '>=' ) )
 194+ ) {
 195+ $buffer .= openssl_random_pseudo_bytes( $bytes - strlen( $buffer ), $openssl_strong );
 196+ if ( strlen( $buffer ) >= $bytes ) {
 197+ // openssl tells us if the random source was strong, if some of our data was generated
 198+ // using it use it's say on whether the randomness is strong
 199+ self::$strong = !!$openssl_strong;
 200+ }
 201+ }
 202+ }
 203+
 204+
 205+ // If we cannot use or generate enough data from /dev/urandom or openssl
 206+ // use this loop to generate a good set of pesudo random data.
 207+ // This works by initializing a random state using a pile of unstable data
 208+ // and continually shoving it through a hash along with a variable salt.
 209+ // We hash the random state with more salt to avoid the state from leaking
 210+ // out and being used to predict the /randomness/ that follows.
 211+ while ( strlen( $buffer ) < $bytes ) {
 212+ $buffer .= self::hmac( self::randomState(), mt_rand() );
 213+ // This code is never really cryptographically strong, if we use it
 214+ // at all, then set strong to false.
 215+ self::$strong = false;
 216+ }
 217+
 218+ // Once the buffer has been filled up with enough random data to fulfill
 219+ // the request shift off enough data to handle the request and leave the
 220+ // unused portion left inside the buffer for the next request for random data
 221+ $generated = substr( $buffer, 0, $bytes );
 222+ $buffer = substr( $buffer, $bytes );
 223+
 224+ return $generated;
 225+ }
 226+
 227+ /**
 228+ * Generate a run of (ideally) cryptographically random data and return
 229+ * it in hexadecimal string format.
 230+ * You can use MWCryptRand::wasStrong() if you wish to know if the source used
 231+ * was cryptographically strong.
 232+ *
 233+ * @param $chars int the number of hex chars of random data to generate
 234+ * @return String Hexadecimal random data
 235+ */
 236+ public static function generateHex( $chars ) {
 237+ // hex strings are 2x the length of raw binary so we divide the length in half
 238+ // odd numbers will result in a .5 that leads the generate() being 1 character
 239+ // short, so we use ceil() to ensure that we always have enough bytes
 240+ $bytes = ceil( $chars / 2 );
 241+ // Generate the data and then convert it to a hex string
 242+ $hex = bin2hex( self::generate( $bytes ) );
 243+ // A bit of paranoia here, the caller asked for a specific length of string
 244+ // here, and it's possible (eg when given an odd number) that we may actually
 245+ // have at least 1 char more than they asked for. Just in case they made this
 246+ // call intending to insert it into a database that does truncation we don't
 247+ // want to give them too much and end up with their database and their live
 248+ // code having two different values because part of what we gave them is truncated
 249+ // hence, we strip out any run of characters longer than what we were asked for.
 250+ return substr( $hex, 0, $chars );
 251+ }
 252+
 253+}
Index: trunk/phase3/includes/AutoLoader.php
@@ -49,6 +49,7 @@
5050 'ConfEditorToken' => 'includes/ConfEditor.php',
5151 'Cookie' => 'includes/Cookie.php',
5252 'CookieJar' => 'includes/Cookie.php',
 53+ 'MWCryptRand' => 'includes/CryptRand.php',
5354 'CurlHttpRequest' => 'includes/HttpFunctions.php',
5455 // 'DBDataObject' => 'includes/DBDataObject.php',
5556 // 'DBTable' => 'includes/DBTable.php',

Follow-up revisions

RevisionCommit summaryAuthorDate
r111974Revert CryptRand class in r111964 till after the git migration.dantman22:35, 20 February 2012

Comments

#Comment by 😂 (talk | contribs)   21:25, 20 February 2012

If nothing's using this yet, can it please wait a couple of weeks until after the git migration?

#Comment by Tim Starling (talk | contribs)   06:37, 9 March 2012

Could you keep the state in a singleton object instead of using static variables?

I strongly recommend using stream_set_read_buffer($urandom, 0) and reading only the minimum number of bytes from /dev/urandom, instead of reading an 8KB block and throwing away 4KB. My testing indicates that this will speed up the function by a factor of 100 or so for most applications, and avoid depleting the system entropy pool.

#Comment by Dantman (talk | contribs)   07:34, 9 March 2012

Any real reason to use a singleton? I originally considered using instances. But after thinking about the way the code reads from urandom and tries to avoid wasting randomness I decided it would be better to just keep it static. Anyone who tries to get more security by initializing a new instance other than the singleton risks depleting entropy or resetting the random state to the start reducing the unpredictability.

I incorporated your comments on streem_set_read_buffer and the chunk_size into my code ideas page along with some other notes. The future iteration should include those improvements along with some others.

#Comment by Dantman (talk | contribs)   09:53, 10 March 2012

I've cloned our core git repo and put MWCryptRand into a branch and pushed it to GitHub while it's being worked on. When it's done I can make it a svn commit or commit it to our git repo when we start production use of it. (I actually already converted a test/mediawiki/core clone into an incompatible mediawiki/core clone so any changes in the underlying repo won't be a problem)

I've already added stream_set_read_buffer and the 4k -> 8k change.

https://github.com/dantman/mediawiki-core/blob/cryptrand/includes/CryptRand.php

Status & tagging log