r47200 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r47199‎ | r47200 | r47201 >
Date:22:29, 12 February 2009
Author:werdna
Status:reverted (Comments)
Tags:
Comment:
Revert "Adds explicit round-off checking to operations that are sensitive to floating point vs. integer round-off errors.", et al. ( r46683, r46671 )
Reimplement, but better! Use BC Math library where possible, for fixed-point math operations. The tolerance aspect has been included by doing math internally at 16 digits, but discarding the last two digits for the purpose of comparison.
Modified paths:
  • /trunk/extensions/ParserFunctions/Expr.php (modified) (history)
  • /trunk/extensions/ParserFunctions/exprTests.txt (added) (history)
  • /trunk/extensions/ParserFunctions/testExpr.php (added) (history)

Diff [purge]

Index: trunk/extensions/ParserFunctions/exprTests.txt
@@ -0,0 +1,39 @@
 2+1 + 1 = 2
 3+-1 + 1 = 0
 4++1 + 1 = 2
 5+4 * 4 = 16
 6+-4 * -4 = 4 * 4
 7+(1/3) * 3 = 1
 8+3 / 1.5 = 2
 9+3 mod 2 = 1
 10+1 or 0
 11+not (1 and 0)
 12+not 0
 13+4.0 round 0; 4
 14+ceil 4; 4
 15+floor 4; 4
 16+4.5 round 0; 5
 17+4.2 round 0; 4
 18+-4.2 round 0; -4
 19+-4.5 round 0; -5
 20+-2.0 round 0; -2
 21+ceil -3; -3
 22+floor -6.0; -6
 23+ceil 4.2; 5
 24+ceil -4.5; -4
 25+floor -4.5; -5
 26+4 < 5
 27+-5 < 2
 28+-2 <= -2
 29+abs(-2); 2
 30+4 > 3
 31+4 > -3
 32+5 >= 2
 33+2 >= 2
 34+1 != 2
 35+not (1 != 1)
 36+1e4 = 10000
 37+1e-2 = 0.01
 38+ln(exp(1));1
 39+trunc(4.5);4
 40+trunc(-4.5);-4
Index: trunk/extensions/ParserFunctions/Expr.php
@@ -46,9 +46,6 @@
4747 define( 'EXPR_POW', 35 );
4848 define( 'EXPR_PI', 36 );
4949
50 -// Tolerance for comparison and integer conversions
51 -define( 'EXPR_TOLERANCE', 1e-10 );
52 -
5350 class ExprError extends Exception {
5451 public function __construct($msg, $parameter = ''){
5552 wfLoadExtensionMessages( 'ParserFunctions' );
@@ -154,46 +151,20 @@
155152 'ceil' => EXPR_CEIL,
156153 'pi' => EXPR_PI,
157154 );
158 -
159 - /**
160 - * Tests whether the fractional difference between two numbers
161 - * is within EXPR_TOLERANCE of each other.
162 - */
163 - function toleranceComparison( $a, $b ) {
164 - if( $b == 0 || $a == 0 ) {
165 - if( $a == $b ) {
166 - return 0;
167 - } elseif( $a > $b ) {
168 - return 1;
169 - } else {
170 - return -1;
171 - }
172 - }
173155
174 - $c = (( $a / $b ) - ( $b / $a )) / 2.0;
175 - if( abs( $c ) < EXPR_TOLERANCE ) {
176 - return 0;
177 - } elseif( $c > 0 ) {
178 - return 1;
179 - } else {
180 - return -1;
 156+ function haveBC() {
 157+ static $haveBC = null;
 158+
 159+ if ($haveBC === null) {
 160+ $haveBC = extension_loaded( 'bcmath' );
 161+ if ($haveBC) // Set to precision of 14.
 162+ bcscale(16);
181163 }
 164+
 165+ return $haveBC;
182166 }
183167
184168 /**
185 - * Checks if $expr is an integer within EXPR_TOLERANCE
186 - * If so, recast as integer and return, else return $expr unchanged.
187 - */
188 - function checkInteger( $expr ) {
189 - $intval = round($expr);
190 - if( $this->toleranceComparison( $expr, $intval ) == 0 ) {
191 - return $intval;
192 - } else {
193 - return $expr;
194 - }
195 - }
196 -
197 - /**
198169 * Evaluate a mathematical expression
199170 *
200171 * The algorithm here is based on the infix to RPN algorithm given in
@@ -414,11 +385,16 @@
415386 }
416387
417388 function doOperation( $op, &$stack ) {
 389+ $haveBC = $this->haveBC();
418390 switch ( $op ) {
419391 case EXPR_NEGATIVE:
420392 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
421393 $arg = array_pop( $stack );
422 - $stack[] = -$arg;
 394+
 395+ if ($haveBC)
 396+ $stack[] = bcmul( '-1', $arg );
 397+ else
 398+ $stack[] = -$arg;
423399 break;
424400 case EXPR_POSITIVE:
425401 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
@@ -427,98 +403,140 @@
428404 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
429405 $right = array_pop( $stack );
430406 $left = array_pop( $stack );
431 - $stack[] = $left * $right;
432 - break;
 407+ if ($haveBC)
 408+ $stack[] = bcmul( $left, $right );
 409+ else
 410+ $stack[] = $left * $right;
 411+ break;
433412 case EXPR_DIVIDE:
434413 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
435414 $right = array_pop( $stack );
436415 $left = array_pop( $stack );
437416 if ( $right == 0 ) throw new ExprError('division_by_zero', $this->names[$op]);
438 - $stack[] = $left / $right;
 417+ if ($haveBC)
 418+ $stack[] = bcdiv( $left, $right );
 419+ else
 420+ $stack[] = $left / $right;
439421 break;
440422 case EXPR_MOD:
441423 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
442424 $right = array_pop( $stack );
443425 $left = array_pop( $stack );
444426 if ( $right == 0 ) throw new ExprError('division_by_zero', $this->names[$op]);
445 - $stack[] = $this->checkInteger( $left ) % $this->checkInteger( $right );
 427+ if ($haveBC)
 428+ $stack[] = bcmod( $left, $right );
 429+ else
 430+ $stack[] = $left % $right;
446431 break;
447432 case EXPR_PLUS:
448433 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
449434 $right = array_pop( $stack );
450435 $left = array_pop( $stack );
451 - $stack[] = $left + $right;
 436+ if ($haveBC)
 437+ $stack[] = bcadd( $left, $right );
 438+ else
 439+ $stack[] = $left + $right;
452440 break;
453441 case EXPR_MINUS:
454442 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
455443 $right = array_pop( $stack );
456444 $left = array_pop( $stack );
457 - $stack[] = $left - $right;
 445+ if ($haveBC)
 446+ $stack[] = bcsub( $left, $right );
 447+ else
 448+ $stack[] = $left - $right;
458449 break;
459450 case EXPR_AND:
460451 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
461452 $right = array_pop( $stack );
462453 $left = array_pop( $stack );
 454+ // PHP seems to treat "0" and "" appropriately for this to work.
463455 $stack[] = ( $left && $right ) ? 1 : 0;
464456 break;
465457 case EXPR_OR:
466458 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
467459 $right = array_pop( $stack );
468460 $left = array_pop( $stack );
 461+ // PHP seems to treat "0" and "" appropriately for this to work.
469462 $stack[] = ( $left || $right ) ? 1 : 0;
470463 break;
471464 case EXPR_EQUALITY:
472465 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
473466 $right = array_pop( $stack );
474467 $left = array_pop( $stack );
475 - $stack[] = ( $this->toleranceComparison( $left, $right ) == 0 ) ? 1 : 0;
 468+ if ($haveBC)
 469+ $stack[] = ( bccompWithTolerance( $left, $right ) == 0 ) ? 1 : 0;
 470+ else
 471+ $stack[] = ( $left == $right ) ? 1 : 0;
476472 break;
477473 case EXPR_NOT:
478474 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
479475 $arg = array_pop( $stack );
 476+ // PHP seems to treat "0" and "" appropriately for this to work.
480477 $stack[] = (!$arg) ? 1 : 0;
481478 break;
482479 case EXPR_ROUND:
483480 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
484481 $digits = intval( array_pop( $stack ) );
485482 $value = array_pop( $stack );
486 - $stack[] = round( $value, $digits );
 483+ if ($haveBC)
 484+ $stack[] = bcround( $value, $digits );
 485+ else
 486+ $stack[] = round( $value, $digits );
487487 break;
488488 case EXPR_LESS:
489489 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
490490 $right = array_pop( $stack );
491491 $left = array_pop( $stack );
492 - $stack[] = ( $this->toleranceComparison( $left, $right ) < 0 ) ? 1 : 0;
 492+ if ($haveBC)
 493+ $stack[] = ( bccompWithTolerance( $left, $right ) == -1 ) ? 1 : 0;
 494+ else
 495+ $stack[] = ( $left < $right ) ? 1 : 0;
493496 break;
494497 case EXPR_GREATER:
495498 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
496499 $right = array_pop( $stack );
497500 $left = array_pop( $stack );
498 - $stack[] = ( $this->toleranceComparison( $left, $right ) > 0 ) ? 1 : 0;
 501+ if ($haveBC)
 502+ $stack[] = ( bccompWithTolerance( $left, $right ) == 1 ) ? 1 : 0;
 503+ else
 504+ $stack[] = ( $left > $right ) ? 1 : 0;
499505 break;
500506 case EXPR_LESSEQ:
501507 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
502508 $right = array_pop( $stack );
503509 $left = array_pop( $stack );
504 - $stack[] = ( $this->toleranceComparison( $left, $right ) <= 0 ) ? 1 : 0;
 510+ if ($haveBC)
 511+ $stack[] = ( bccompWithTolerance( $left, $right ) != 1 ) ? 1 : 0;
 512+ else
 513+ $stack[] = ( $left <= $right ) ? 1 : 0;
505514 break;
506515 case EXPR_GREATEREQ:
507516 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
508517 $right = array_pop( $stack );
509518 $left = array_pop( $stack );
510 - $stack[] = ( $this->toleranceComparison( $left, $right ) >= 0 ) ? 1 : 0;
 519+ if ($haveBC)
 520+ $stack[] = ( bccompWithTolerance( $left, $right ) != -1 ) ? 1 : 0;
 521+ else
 522+ $stack[] = ( $left >= $right ) ? 1 : 0;
511523 break;
512524 case EXPR_NOTEQ:
513525 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
514526 $right = array_pop( $stack );
515527 $left = array_pop( $stack );
516 - $stack[] = ( $this->toleranceComparison( $left, $right ) != 0 ) ? 1 : 0;
 528+ if ($haveBC)
 529+ $stack[] = ( bccompWithTolerance( $left, $right ) != 0 ) ? 1 : 0;
 530+ else
 531+ $stack[] = ( $left != $right ) ? 1 : 0;
517532 break;
518533 case EXPR_EXPONENT:
519534 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
520535 $right = array_pop( $stack );
521536 $left = array_pop( $stack );
522 - $stack[] = $left * pow(10, $this->checkInteger( $right ) );
 537+ if ($haveBC)
 538+ $stack[] = bcmul( $left, bcpow(10, $right) );
 539+ else
 540+ $stack[] = $left * pow(10,$right);
523541 break;
524542 case EXPR_SINE:
525543 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
@@ -555,39 +573,61 @@
556574 case EXPR_EXP:
557575 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
558576 $arg = array_pop( $stack );
559 - $stack[] = exp($arg);
 577+ if ($haveBC) // Inaccurate, I know...
 578+ $stack[] = bcpow( exp(1), $arg );
 579+ else
 580+ $stack[] = exp($arg);
560581 break;
561582 case EXPR_LN:
562583 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
563584 $arg = array_pop( $stack );
564585 if ( $arg <= 0 ) throw new ExprError('invalid_argument_ln', $this->names[$op]);
565 - $stack[] = log($arg);
 586+ if ($haveBC) // ln(x) = 1^(1/e)
 587+ $stack[] = bcpow( $arg, bcdiv( 1, exp(1) ) );
 588+ else
 589+ $stack[] = log($arg);
566590 break;
567591 case EXPR_ABS:
568592 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
569593 $arg = array_pop( $stack );
570 - $stack[] = abs($arg);
 594+ if ($haveBC)
 595+ $stack[] = bcabs( $arg );
 596+ else
 597+ $stack[] = abs($arg);
571598 break;
572599 case EXPR_FLOOR:
573600 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
574601 $arg = array_pop( $stack );
575 - $stack[] = floor( $this->checkInteger( $arg ) );
 602+ if ($haveBC)
 603+ $stack[] = bcfloor( $arg );
 604+ else
 605+ $stack[] = floor($arg);
576606 break;
577607 case EXPR_TRUNC:
578608 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
579609 $arg = array_pop( $stack );
580 - $stack[] = (int)( $this->checkInteger( $arg ) );
 610+ if ($haveBC)
 611+ $stack[] = bcadd( $arg, '0', 0 );
 612+ else
 613+ $stack[] = (int)$arg;
581614 break;
582615 case EXPR_CEIL:
583616 if ( count( $stack ) < 1 ) throw new ExprError('missing_operand', $this->names[$op]);
584617 $arg = array_pop( $stack );
585 - $stack[] = ceil( $this->checkInteger( $arg ) );
 618+ $stack[] = ceil($arg);
586619 break;
587620 case EXPR_POW:
588621 if ( count( $stack ) < 2 ) throw new ExprError('missing_operand', $this->names[$op]);
589622 $right = array_pop( $stack );
590623 $left = array_pop( $stack );
591 - if ( false === ($stack[] = pow($left, $right)) ) throw new ExprError('division_by_zero', $this->names[$op]);
 624+ $result = false;
 625+ if ($haveBC)
 626+ $result = pow( $left, $right );
 627+ else
 628+ $result = bcpow( $left, $right );
 629+
 630+ if ( false === $result )
 631+ throw new ExprError('division_by_zero', $this->names[$op]);
592632 break;
593633 default:
594634 // Should be impossible to reach here.
@@ -595,3 +635,63 @@
596636 }
597637 }
598638 }
 639+
 640+function bccompWithTolerance( $left, $right ) {
 641+ $left = bcround( $left, 14 );
 642+ $right = bcround( $right, 14 );
 643+ return bccomp( $left, $right, 14 );
 644+}
 645+
 646+// User-contributed documentation by 'Charles' at
 647+// http://us3.php.net/manual/en/ref.bc.php
 648+// with modifications for readability.
 649+function bcround($strval, $precision = 0) {
 650+ if (false !== ($pos = strpos($strval, '.')) &&
 651+ ( strlen($strval) - $pos - 1) > $precision ) {
 652+ $zeros = str_repeat("0", $precision);
 653+
 654+ if ( bccomp( $strval, 0 ) >= 0 ) {
 655+ return bcadd($strval, "0.{$zeros}5", $precision);
 656+ } else {
 657+ return bcsub($strval, "0.{$zeros}5", $precision);
 658+ }
 659+ } else {
 660+ return $strval;
 661+ }
 662+}
 663+
 664+function bcabs( $strval, $precision = 0 ) {
 665+ if ( bccomp( $strval, 0 ) >= 0 ) { // Handle precision
 666+ return bcmul( $strval, 1, $precision );
 667+ } else {
 668+ return bcmul( $strval, -1, $precision );
 669+ }
 670+}
 671+
 672+function bcceil($strval, $precision = 0) {
 673+ if ( bccomp( '0', $strval ) == -1 ) {
 674+ // Negative number, just truncate.
 675+ return bcadd( '0', $strval, $precision );
 676+ } elseif( bccomp( '0', $strval ) == -1 ) {
 677+ // Positive number, truncate and maybe add one.
 678+ $truncated = bcadd( '0', $strval, $precision );
 679+
 680+ if ( bccomp( $truncated, $strval ) != 0 ) {
 681+ return bcadd( $truncated, '1', $precision );
 682+ }
 683+ } else {
 684+ // Exactly zero
 685+ return bcadd( $strval, '0', $precision );
 686+ }
 687+}
 688+
 689+function bcfloor( $strval, $precision = 0 ) {
 690+ // Cheating -- subtract 1 from bcceil :D
 691+ $ceil = bcceil( $strval, $precision );
 692+
 693+ if ( bccomp( $ceil, $strval ) == 0 ) {
 694+ return $ceil;
 695+ } else {
 696+ return bcsub( $ceil, '1', $precision );
 697+ }
 698+}
Index: trunk/extensions/ParserFunctions/testExpr.php
@@ -0,0 +1,38 @@
 2+<?php
 3+
 4+require_once ( getenv('MW_INSTALL_PATH') !== false
 5+ ? getenv('MW_INSTALL_PATH')."/maintenance/commandLine.inc"
 6+ : dirname( __FILE__ ) . '/../../maintenance/commandLine.inc' );
 7+require( 'Expr.php' );
 8+
 9+$tests = file( 'exprTests.txt' );
 10+
 11+$pass = $fail = 0;
 12+
 13+// Each test is on one line. The test must always evaluate to '1'.
 14+$parser = new ExprParser;
 15+foreach( $tests as $test ) {
 16+ $test = trim($test);
 17+ if ( in_string( ';', $test ) )
 18+ list($input,$expected) = explode(';', $test);
 19+ else {
 20+ $input = $test;
 21+ $expected = 1;
 22+ }
 23+
 24+ $expected = trim($expected);
 25+ $input = trim($input);
 26+
 27+ $result = $parser->doExpression( $input );
 28+ if ($result != $expected) {
 29+ print
 30+ "FAILING test -- $input
 31+ gave a final result of $result, instead of $expected.\n";
 32+ $fail++;
 33+ } else {
 34+ print "PASSED test $test\n";
 35+ $pass++;
 36+ }
 37+}
 38+
 39+print "Passed $pass tests, failed $fail tests, out of a total of ".($pass+$fail)."\n";
\ No newline at end of file

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r46671Adds explicit round-off checking to operations that are sensitive to floating...rarohde03:56, 1 February 2009
r46683Adds explicit fractional tolerance checks to comparison functions. Currently...rarohde18:40, 1 February 2009

Comments

#Comment by Simetrical (talk | contribs)   23:23, 12 February 2009

+ function haveBC() { ... + if ($haveBC) // Set to precision of 14. + bcscale(16);

A function called "haveBC" should *not* have side-effects. This should be initialized separately.

(Why do you even need to cache this? Is extension_loaded slow?)

Also, why don't you make this configurable?

+ if ($haveBC) + $stack[] = bcmul( '-1', $arg ); + else + $stack[] = -$arg; [* about 5 million]

All this stuff is ugly. Why don't you just use compatibility functions like we usually do?

	if ( !extension_loaded( 'bcmath' ) ) {
		function bcmul( $left, $right ) { return $left * $right; }
		...
	}

etc.

+ // PHP seems to treat "0" and "" appropriately for this to work.

				$stack[] = ( $left && $right ) ? 1 : 0;

That comment sounds evil. Are you sure you can't have stuff like "0.0" gumming up the gearworks here?

+ if ($haveBC) // Inaccurate, I know... + $stack[] = bcpow( exp(1), $arg );

Why not just write in the value of e as a constant string here to a bunch of digits?

+ if ($haveBC) // ln(x) = 1^(1/e)

Smoking some of the crack, Werdna?  :) A constant can't equal ln(x). I guess you mean ln(x) = x^(1/e). But this is wrong; I don't know where you got it. ln(x) is logarithmic but x^(1/e) is polynomial. More specifically, ln(1) = 0 but 1^(1/e) = 1. ln(e) = 1 but e^(1/e) > 1. Etc.

I don't think there's any way to write log exactly in terms of the operations bcmath makes available. It's likely possible to do some kind of approximation, but I suspect only very slowly.

+function bcround($strval, $precision = 0) { ... +function bcabs( $strval, $precision = 0 ) { ... +function bcceil($strval, $precision = 0) {

These are confusing names to use. They sound like they're part of the extension.

--- trunk/extensions/ParserFunctions/testExpr.php (revision 0) +++ trunk/extensions/ParserFunctions/testExpr.php (revision 47200)

Why don't you use ordinary parser tests so that at least some people will be checking them on a regular basis?

#Comment by Werdna (talk | contribs)   23:57, 12 February 2009

Some fixes in r47203. I was confusing ln() with root base e.

Caching isn't strictly necessary, but it saves some time. Since it's checked per-operation, an average page might check it 1000-10000 times (or somewhere around that order of magnitude). Some quick checking indicates that this would be on the order of 15-150ms, not a huge time difference, but non-trivial.

You're right about '0.0', and a quick test indicates that it might give that output. I'm thinking about how to best work around that.

Writing in the value of e as a constant string would be even more inaccurate than exp(1)

#Comment by Simetrical (talk | contribs)   00:05, 13 February 2009

"Caching isn't strictly necessary, but it saves some time. Since it's checked per-operation, an average page might check it 1000-10000 times (or somewhere around that order of magnitude). Some quick checking indicates that this would be on the order of 15-150ms, not a huge time difference, but non-trivial. "

This would be avoided if you used compat functions (which I think is better style anyway).

"Writing in the value of e as a constant string would be even more inaccurate than exp(1) "

Why?

-					$stack[] = bcpow( exp(1), $arg );
+					$stack[] = bcpow( '2.71828182845904523536', $arg );
#Comment by Werdna (talk | contribs)   00:35, 13 February 2009
andrew@voltaire:~$ php -r 'print bcsub(exp(1), 2.71828182845904523536, 50 );'
0.00000000000000000000000000000000000000000000000000

Seems to make no difference, so why go with the less readable version?

#Comment by Simetrical (talk | contribs)   01:07, 13 February 2009

You left off the string delimiters, so of course it makes no difference.

0 20:07:20 /var/www/git-trunk$ php -r 'print bcsub(exp(1), "2.71828182845904523536", 20 ) . "\n";'
0.00000000000095476464

Status & tagging log