r46683 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r46682‎ | r46683 | r46684 >
Date:18:40, 1 February 2009
Author:rarohde
Status:deferred (Comments)
Tags:
Comment:
Adds explicit fractional tolerance checks to comparison functions. Currently set to 1e-10.

Addresses concerns related to r46671.
Modified paths:
  • /trunk/extensions/ParserFunctions/Expr.php (modified) (history)

Diff [purge]

Index: trunk/extensions/ParserFunctions/Expr.php
@@ -46,6 +46,9 @@
4747 define( 'EXPR_POW', 35 );
4848 define( 'EXPR_PI', 36 );
4949
 50+// Tolerance for comparison and integer conversions
 51+define( 'EXPR_TOLERANCE', 1e-10 );
 52+
5053 class ExprError extends Exception {
5154 public function __construct($msg, $parameter = ''){
5255 wfLoadExtensionMessages( 'ParserFunctions' );
@@ -153,24 +156,39 @@
154157 );
155158
156159 /**
157 - * Checks if $expr is an integer within one part in 10^10.
158 - * If so, recast as integer and return, else return $expr unchanged.
159 - *
160 - * Explicit checking for round-off errors should eliminate
161 - * many bugs associated with floating point conversion to integers.
 160+ * Tests whether the fractional difference between two numbers
 161+ * is within EXPR_TOLERANCE of each other.
162162 */
163 - function checkInteger( $expr ) {
164 - $intval = (int)$expr;
165 -
166 - if( $expr >= 0 ) {
167 - if( $expr - $intval > 0.9999999999 ) { $intval += 1; }
 163+ function toleranceComparison( $a, $b ) {
 164+ if( $b == 0 || $a == 0 ) {
 165+ if( $a == $b ) {
 166+ return 0;
 167+ } elseif( $a > $b ) {
 168+ return 1;
 169+ } else {
 170+ return -1;
 171+ }
 172+ }
 173+
 174+ $c = (( $a / $b ) - ( $b / $a )) / 2.0;
 175+ if( abs( $c ) < EXPR_TOLERANCE ) {
 176+ return 0;
 177+ } elseif( $c > 0 ) {
 178+ return 1;
168179 } else {
169 - if( $expr - $intval < -0.9999999999 ) { $intval -= 1; }
 180+ return -1;
170181 }
 182+ }
171183
172 - if( abs( $intval - $expr ) < 0.0000000001 ) {
 184+ /**
 185+ * Checks if $expr is an integer within EXPR_TOLERANCE
 186+ * If so, recast as integer and return, else return $expr unchanged.
 187+ */
 188+ function checkInteger( $expr ) {
 189+ $intval = round($expr);
 190+ if( toleranceComparison( $expr, $intval ) == 0 ) {
173191 return $intval;
174 - } else {
 192+ } else {
175193 return $expr;
176194 }
177195 }
@@ -453,7 +471,7 @@
454472 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
455473 $right = array_pop( $stack );
456474 $left = array_pop( $stack );
457 - $stack[] = ( $this->checkInteger( $left ) == $this->checkInteger( $right ) ) ? 1 : 0;
 475+ $stack[] = ( $this->toleranceComparison( $left, $right ) == 0 ) ? 1 : 0;
458476 break;
459477 case EXPR_NOT:
460478 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
@@ -464,37 +482,37 @@
465483 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
466484 $digits = intval( array_pop( $stack ) );
467485 $value = array_pop( $stack );
468 - $stack[] = round( $this->checkInteger( $value ), $digits );
 486+ $stack[] = round( $value, $digits );
469487 break;
470488 case EXPR_LESS:
471489 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
472490 $right = array_pop( $stack );
473491 $left = array_pop( $stack );
474 - $stack[] = ( $this->checkInteger( $left ) < $this->checkInteger( $right ) ) ? 1 : 0;
 492+ $stack[] = ( $this->toleranceComparison( $left, $right ) < 0 ) ? 1 : 0;
475493 break;
476494 case EXPR_GREATER:
477495 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
478496 $right = array_pop( $stack );
479497 $left = array_pop( $stack );
480 - $stack[] = ( $this->checkInteger( $left ) > $this->checkInteger( $right ) ) ? 1 : 0;
 498+ $stack[] = ( $this->toleranceComparison( $left, $right ) > 0 ) ? 1 : 0;
481499 break;
482500 case EXPR_LESSEQ:
483501 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
484502 $right = array_pop( $stack );
485503 $left = array_pop( $stack );
486 - $stack[] = ( $this->checkInteger( $left ) <= $this->checkInteger( $right ) ) ? 1 : 0;
 504+ $stack[] = ( $this->toleranceComparison( $left, $right ) <= 0 ) ? 1 : 0;
487505 break;
488506 case EXPR_GREATEREQ:
489507 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
490508 $right = array_pop( $stack );
491509 $left = array_pop( $stack );
492 - $stack[] = ( $this->checkInteger( $left ) >= $this->checkInteger( $right ) ) ? 1 : 0;
 510+ $stack[] = ( $this->toleranceComparison( $left, $right ) >= 0 ) ? 1 : 0;
493511 break;
494512 case EXPR_NOTEQ:
495513 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
496514 $right = array_pop( $stack );
497515 $left = array_pop( $stack );
498 - $stack[] = ( $this->checkInteger( $left ) != $this->checkInteger( $right ) ) ? 1 : 0;
 516+ $stack[] = ( $this->toleranceComparison( $left, $right ) != 0 ) ? 1 : 0;
499517 break;
500518 case EXPR_EXPONENT:
501519 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);

Follow-up revisions

RevisionCommit summaryAuthorDate
r47200Revert "Adds explicit round-off checking to operations that are sensitive to ...werdna22:29, 12 February 2009

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r46671Adds explicit round-off checking to operations that are sensitive to floating...rarohde03:56, 1 February 2009

Comments

#Comment by Simetrical (talk | contribs)   19:44, 1 February 2009

I'm still not at all sure about this. It seems like it would probably have some kind of scary side effects somehow. More specific concerns:

  1. What made you choose a/b - b/a? I can't quite see how this is meant to behave, other than the obvious fact that it handles a few more cases correctly.
  2. Is 1e10 the best cutoff to choose? What's PHP's floating-point precision?
  3. This still makes strictly fewer things possible. It's possible to emulate this behavior under the current system by checking (a - b) < 1e-10 yourself, but it's *not* possible to emulate higher precision in the new system.

I think it might be better to keep the status quo for now. In particular, with respect to (3), I don't see the problem in just telling template programmers that they shouldn't test floating-point numbers for equality, like all other programmers.

#Comment by Dragons flight (talk | contribs)   21:24, 1 February 2009
  1. First, using a fractional expression rather than something like abs(a-b) < TOL, means the process is agnostic to the size of the variables. In other words, it correctly knows that 1.23e-30 != 4.6e-20, whereas a tolerance by difference would usually consider those the same. Regarding the technical detail, I used a/b - b/a because that is symmetric under interchange of parameters, whereas a definition like a/b - 1 can lead to situations a == b but b != a.
  2. PHP's precision is system dependent, but generally it is the same as C's double. A double usually gives 15-16 significant digits base 10. I chose 1e-10 as a compromise between preserving significant figures and wanting to limit the possibility that accumulating round-off error alone would exceed the tolerance. Since we aren't expecting Mediawiki's #expr function to be a tool for advanced computations, I would guess that 10 digits of precision is enough for nearly all comparisons. And as I said before, the alternative that 0.07*100 != 7 and 5/6 != (1/6)*5 is very confusing to everyday users.
  3. Forcing template programmers to add emulation everywhere they want to reliably use: floor, mod, ceil, trunc, and any of the comparison functions would require a large amount of extra template code, and in my opinion is very undesirable. By contrast, it IS still possible to emulate higher precision in those cases where people feel having more than ten significant figures is essential. Because the basic operations (e.g. +, -, *) do not round, one can break up a number, v, into high and low portions with constructions like v_low = v - round(v,10) and v_high = v - v_low before doing comparison testing. v_low will never have more than 5 or 6 real digits, and will often have fewer since the lowest ones may already be corrupted by round-off error in floating point operations, so the usability of this for comparison testing may be dodgy though. (The 10 in the round above actually needs to be replaced with a log expression that accounts for the place value of the highest significant digit, since round counts relative to a fixed decimal, but you get the idea.)

I would like to re-emphasize that this does not have any impact on basic mathematics. So someone who just wants to find {{#expr:78.4^1.2 - 67/92.1}} will still see all ~15 digits (or whatever PHP is able to give them). This is just about improving integer operations (floor, mod, etc.) and comparison functions so that everyday template coders don't have to go nuts trying to figure out why 5/6 != (1/6)*5.

Yes, there are some trade-offs associated with adding a specific tolerance, but I think the usability improvement associated with giving users math functions whose behavior is easy to understand, significantly outweighs the losses associated with limiting precision in a few cases.

Status & tagging log