Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Jul 3, 2017

What changes were proposed in this pull request?

This pr modified code to use string types by default if array and map in functions have no argument. This behaviour is the same with Hive one;

hive> CREATE TEMPORARY TABLE t1 AS SELECT map();
hive> DESCRIBE t1;
_c0   map<string,string>                          

hive> CREATE TEMPORARY TABLE t2 AS SELECT array();
hive> DESCRIBE t2;
_c0   array<string>

How was this patch tested?

Added tests in DataFrameFunctionsSuite.

errMsg = intercept[IllegalArgumentException] {
spark.range(1).select(greatest())
}.getMessage
assert(errMsg.contains("requirement failed: greatest requires at least 2 arguments"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Exception types are different between sql.functions.greatest and selectExpr("greatest()");

cala> spark.range(1).select(greatest()).show
java.lang.IllegalArgumentException: requirement failed: greatest requires at least 2 arguments.
  at scala.Predef$.require(Predef.scala:224)
  at org.apache.spark.sql.functions$.greatest(functions.scala:1570)
  ... 48 elided

scala> spark.range(1).selectExpr("greatest()").show
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest()' due to data type mismatch: GREATEST requires at least 2 arguments; line 1 pos 0;
'Project [unresolvedalias(greatest(), Some(<function1>))]
+- Range (0, 1, step=1, splits=Some(4))

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:95)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:87)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh. I see.

).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
}

test("SPARK-21281 fails if functions have no argument") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have the six functions that have variable-length arguments and cannot stand no argument in sql.functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a helper function for removing these duplicate codes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@SparkQA
Copy link

SparkQA commented Jul 3, 2017

Test build #79103 has finished for PR 18516 at commit 1e29eab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jul 4, 2017

The hive-related tests failed, so I checked the hive behaviour;

hive> CREATE TEMPORARY TABLE t1 AS SELECT map();
hive> DESCRIBE t1;
_c0   map<string,string>                          

hive> CREATE TEMPORARY TABLE t2 AS SELECT array();
hive> DESCRIBE t2;
_c0   array<string>

hive> CREATE TEMPORARY TABLE t3 AS SELECT coalesce();
FAILED: SemanticException [Error 10305]: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field:  _c0

hive> CREATE TEMPORARY TABLE t4 AS SELECT struct();
hive> DESCRIBE t4;
_c0                     struct<>   

hive> CREATE TEMPORARY TABLE t5 AS SELECT greatest();
FAILED: SemanticException [Error 10015]: Line 1:37 Arguments length mismatch 'greatest': greatest requires at least 2 arguments, got 0

hive> CREATE TEMPORARY TABLE t6 AS SELECT least();
FAILED: SemanticException [Error 10015]: Line 1:37 Arguments length mismatch 'least': least requires at least 2 arguments, got 0

In hive, array and map with no argument use string types by default. Spark fails when using struct with no argument though, it seems hive return an empty struct. If you get time, could you check this? @gatorsmile Thanks!

TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array")
override def checkInputDataTypes(): TypeCheckResult = {
if (children == Nil) {
TypeCheckResult.TypeCheckFailure("input to function coalesce cannot be empty")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coalesce ?

override def checkInputDataTypes(): TypeCheckResult = {
if (children.size % 2 != 0) {
if (children == Nil) {
TypeCheckResult.TypeCheckFailure("input to function coalesce cannot be empty")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coalesce ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, my bad.

@SparkQA
Copy link

SparkQA commented Jul 4, 2017

Test build #79138 has finished for PR 18516 at commit 1636f97.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 4, 2017

Test build #79145 has finished for PR 18516 at commit 0b492fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

TypeCheckResult.TypeCheckSuccess
}
TypeUtils.checkTypeInputDimension(
children.map(_.dataType), s"function $prettyName", requiredMinDimension = 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now we have a consistent message instsead of at least one argument or cannot be empty. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

@SparkQA
Copy link

SparkQA commented Jul 4, 2017

Test build #79161 has finished for PR 18516 at commit 0b492fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the SPARK-21281 branch 2 times, most recently from 24834fc to 28d060f Compare July 5, 2017 01:39
@maropu maropu changed the title [SPARK-21281][SQL] Throw AnalysisException if array and map have no argument [SPARK-21281][SQL] Use string types by default if array and map have no argument Jul 5, 2017
@SparkQA
Copy link

SparkQA commented Jul 5, 2017

Test build #79174 has finished for PR 18516 at commit 24834fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2017

Test build #79175 has finished for PR 18516 at commit 28d060f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2017

Test build #79180 has finished for PR 18516 at commit d9c05a6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2017

Test build #79193 has finished for PR 18516 at commit 04cbf78.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jul 5, 2017

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 5, 2017

Test build #79209 has finished for PR 18516 at commit 04cbf78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

s" ${invalidNames.mkString(",")}")
} else if (!names.contains(null)) {
TypeCheckResult.TypeCheckSuccess
if (children.size % 2 != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use else if to flatten the nest if.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented Jul 6, 2017

Test build #79253 has finished for PR 18516 at commit f71dc27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2017

Test build #79353 has finished for PR 18516 at commit f71dc27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 7896e7b Jul 8, 2017
if (children.size % 2 != 0) {
if (children.length < 1) {
TypeCheckResult.TypeCheckFailure(
s"input to function $prettyName requires at least one argument")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to what this PR claims to do. What's the reason behind this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change and caused a problem in #22373

Copy link
Member Author

@maropu maropu Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but I don't remember correctly.
I looked over this pr again and I also think the modification is not related to this pr. So, it's ok to revert this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants