-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11753][SQL][test-hadoop2.2] Make allowNonNumericNumbers option work #9759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
2777677
b2a835d
186fa5e
dc9abc3
6d90b24
c74d715
6f668c3
1cfd1dc
4fca52c
0e473fc
025edea
af1e3a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,7 @@ package org.apache.spark.sql.execution.datasources.json | |
|
|
||
| import org.apache.spark.sql.QueryTest | ||
| import org.apache.spark.sql.test.SharedSQLContext | ||
| import org.apache.spark.sql.types.{DoubleType, StructField, StructType} | ||
|
|
||
| /** | ||
| * Test cases for various [[JSONOptions]]. | ||
|
|
@@ -107,10 +108,11 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext { | |
| // quoted non-numeric numbers should still work even allowNonNumericNumbers is off. | ||
| testCases = Seq("""{"age": "NaN"}""", """{"age": "Infinity"}""", """{"age": "-Infinity"}""") | ||
| val tests: Seq[Double => Boolean] = Seq(_.isNaN, _.isPosInfinity, _.isNegInfinity) | ||
| val schema = StructType(StructField("age", DoubleType, true) :: Nil) | ||
|
|
||
| testCases.zipWithIndex.foreach { case (str, idx) => | ||
| val rdd = spark.sparkContext.parallelize(Seq(str)) | ||
| val df = spark.read.option("allowNonNumericNumbers", "false").json(rdd) | ||
| val df = spark.read.option("allowNonNumericNumbers", "false").schema(schema).json(rdd) | ||
|
|
||
| assert(df.schema.head.name == "age") | ||
| assert(tests(idx)(df.first().getDouble(0))) | ||
|
|
@@ -124,10 +126,10 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext { | |
| val tests: Seq[Double => Boolean] = Seq(_.isNaN, _.isPosInfinity, _.isNegInfinity, | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides, I found that "Inf", "-Inf" seems not working even
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to upgrade jackson library version in order to support "INF" and "-INF" (case-sensitive). |
||
| _.isPosInfinity, _.isNegInfinity, _.isNaN, _.isPosInfinity, _.isNegInfinity, | ||
| _.isPosInfinity, _.isNegInfinity) | ||
|
|
||
| val schema = StructType(StructField("age", DoubleType, true) :: Nil) | ||
| testCases.zipWithIndex.foreach { case (str, idx) => | ||
| val rdd = spark.sparkContext.parallelize(Seq(str)) | ||
| val df = spark.read.option("allowNonNumericNumbers", "true").json(rdd) | ||
| val df = spark.read.option("allowNonNumericNumbers", "true").schema(schema).json(rdd) | ||
|
|
||
| assert(df.schema.head.name == "age") | ||
| assert(tests(idx)(df.first().getDouble(0))) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it's double type? Shouldn't it be string if
allowNonNumericNumbersis off?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from @rxin comment that we want to support quoted non-numeric numbers when
allowNonNumericNumbersis off.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense to me. What if users really want to use
"NaN"as string?cc @rxin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I shouldn't change
InferSchem. Tests here are also need to add few doubles. I will update it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, it should be number when the field is inferred as double/float type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I updated it.
Non-quoted non-numeric numbers are parsed as double when the corresponding field is double/float type. This behavior is as same as before this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I say that anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I don't think that's what I meant there.