-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21281][SQL] Use string types by default if array and map have no argument #18516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| errMsg = intercept[IllegalArgumentException] { | ||
| spark.range(1).select(greatest()) | ||
| }.getMessage | ||
| assert(errMsg.contains("requirement failed: greatest requires at least 2 arguments")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Exception types are different between sql.functions.greatest and selectExpr("greatest()");
cala> spark.range(1).select(greatest()).show
java.lang.IllegalArgumentException: requirement failed: greatest requires at least 2 arguments.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.functions$.greatest(functions.scala:1570)
... 48 elided
scala> spark.range(1).selectExpr("greatest()").show
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest()' due to data type mismatch: GREATEST requires at least 2 arguments; line 1 pos 0;
'Project [unresolvedalias(greatest(), Some(<function1>))]
+- Range (0, 1, step=1, splits=Some(4))
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:95)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:87)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh. I see.
| ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) | ||
| } | ||
|
|
||
| test("SPARK-21281 fails if functions have no argument") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have the six functions that have variable-length arguments and cannot stand no argument in sql.functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a helper function for removing these duplicate codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
|
Test build #79103 has finished for PR 18516 at commit
|
|
The hive-related tests failed, so I checked the hive behaviour; In hive, |
| TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") | ||
| override def checkInputDataTypes(): TypeCheckResult = { | ||
| if (children == Nil) { | ||
| TypeCheckResult.TypeCheckFailure("input to function coalesce cannot be empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coalesce ?
| override def checkInputDataTypes(): TypeCheckResult = { | ||
| if (children.size % 2 != 0) { | ||
| if (children == Nil) { | ||
| TypeCheckResult.TypeCheckFailure("input to function coalesce cannot be empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coalesce ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, my bad.
|
Test build #79138 has finished for PR 18516 at commit
|
|
Test build #79145 has finished for PR 18516 at commit
|
|
Retest this please. |
| TypeCheckResult.TypeCheckSuccess | ||
| } | ||
| TypeUtils.checkTypeInputDimension( | ||
| children.map(_.dataType), s"function $prettyName", requiredMinDimension = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now we have a consistent message instsead of at least one argument or cannot be empty. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
|
Test build #79161 has finished for PR 18516 at commit
|
24834fc to
28d060f
Compare
|
Test build #79174 has finished for PR 18516 at commit
|
|
Test build #79175 has finished for PR 18516 at commit
|
|
Test build #79180 has finished for PR 18516 at commit
|
|
Test build #79193 has finished for PR 18516 at commit
|
|
Jenkins, retest this please. |
|
Test build #79209 has finished for PR 18516 at commit
|
| s" ${invalidNames.mkString(",")}") | ||
| } else if (!names.contains(null)) { | ||
| TypeCheckResult.TypeCheckSuccess | ||
| if (children.size % 2 != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use else if to flatten the nest if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
Test build #79253 has finished for PR 18516 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
retest this please |
|
Test build #79353 has finished for PR 18516 at commit
|
|
Thanks! Merging to master. |
| if (children.size % 2 != 0) { | ||
| if (children.length < 1) { | ||
| TypeCheckResult.TypeCheckFailure( | ||
| s"input to function $prettyName requires at least one argument") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to what this PR claims to do. What's the reason behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change and caused a problem in #22373
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, but I don't remember correctly.
I looked over this pr again and I also think the modification is not related to this pr. So, it's ok to revert this part.
What changes were proposed in this pull request?
This pr modified code to use string types by default if
arrayandmapin functions have no argument. This behaviour is the same with Hive one;How was this patch tested?
Added tests in
DataFrameFunctionsSuite.