Skip to content

Commit 5d3f461

Browse files
jiangxb1987srowen
authored andcommitted
[SPARK-17506][SQL] Improve the check double values equality rule.
## What changes were proposed in this pull request? In `ExpressionEvalHelper`, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large. Before: ``` val1 = 1.6358558070241E306 val2 = 1.6358558070240974E306 ExpressionEvalHelper.compareResults(val1, val2) false ``` In fact, `val1` and `val2` are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage]. After: ``` val1 = 1.6358558070241E306 val2 = 1.6358558070240974E306 ExpressionEvalHelper.compareResults(val1, val2) true ``` ## How was this patch tested? Exsiting testcases. Author: jiangxingbo <[email protected]> Closes #15059 from jiangxb1987/deq.
1 parent 3fe630d commit 5d3f461

File tree

2 files changed

+30
-7
lines changed

2 files changed

+30
-7
lines changed

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -170,11 +170,9 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
170170
checkEvaluation(Remainder(positiveLongLit, positiveLongLit), 0L)
171171
checkEvaluation(Remainder(negativeLongLit, negativeLongLit), 0L)
172172

173-
// TODO: the following lines would fail the test due to inconsistency result of interpret
174-
// and codegen for remainder between giant values, seems like a numeric stability issue
175-
// DataTypeTestUtils.numericTypeWithoutDecimal.foreach { tpe =>
176-
// checkConsistencyBetweenInterpretedAndCodegen(Remainder, tpe, tpe)
177-
// }
173+
DataTypeTestUtils.numericTypeWithoutDecimal.foreach { tpe =>
174+
checkConsistencyBetweenInterpretedAndCodegen(Remainder, tpe, tpe)
175+
}
178176
}
179177

180178
test("Abs") {

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.expressions
1919

2020
import org.scalacheck.Gen
2121
import org.scalactic.TripleEqualsSupport.Spread
22+
import org.scalatest.exceptions.TestFailedException
2223
import org.scalatest.prop.GeneratorDrivenPropertyChecks
2324

2425
import org.apache.spark.SparkFunSuite
@@ -289,13 +290,37 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
289290
(result, expected) match {
290291
case (result: Array[Byte], expected: Array[Byte]) =>
291292
java.util.Arrays.equals(result, expected)
292-
case (result: Double, expected: Spread[Double @unchecked]) =>
293-
expected.asInstanceOf[Spread[Double]].isWithin(result)
294293
case (result: Double, expected: Double) if result.isNaN && expected.isNaN =>
295294
true
295+
case (result: Double, expected: Double) =>
296+
relativeErrorComparison(result, expected)
296297
case (result: Float, expected: Float) if result.isNaN && expected.isNaN =>
297298
true
298299
case _ => result == expected
299300
}
300301
}
302+
303+
/**
304+
* Private helper function for comparing two values using relative tolerance.
305+
* Note that if x or y is extremely close to zero, i.e., smaller than Double.MinPositiveValue,
306+
* the relative tolerance is meaningless, so the exception will be raised to warn users.
307+
*
308+
* TODO: this duplicates functions in spark.ml.util.TestingUtils.relTol and
309+
* spark.mllib.util.TestingUtils.relTol, they could be moved to common utils sub module for the
310+
* whole spark project which does not depend on other modules. See more detail in discussion:
311+
* https://github.com/apache/spark/pull/15059#issuecomment-246940444
312+
*/
313+
private def relativeErrorComparison(x: Double, y: Double, eps: Double = 1E-8): Boolean = {
314+
val absX = math.abs(x)
315+
val absY = math.abs(y)
316+
val diff = math.abs(x - y)
317+
if (x == y) {
318+
true
319+
} else if (absX < Double.MinPositiveValue || absY < Double.MinPositiveValue) {
320+
throw new TestFailedException(
321+
s"$x or $y is extremely close to zero, so the relative tolerance is meaningless.", 0)
322+
} else {
323+
diff < eps * math.min(absX, absY)
324+
}
325+
}
301326
}

0 commit comments

Comments
 (0)