Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Support aliases for table value functions
  • Loading branch information
maropu committed May 9, 2017
commit 02e70076d42c951e9a605c31be9cdc364b652d72
Original file line number Diff line number Diff line change
Expand Up @@ -472,11 +472,11 @@ identifierComment
;

relationPrimary
: tableIdentifier sample? (AS? strictIdentifier)? #tableName
| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery
| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
| inlineTable #inlineTableDefault2
| identifier '(' (expression (',' expression)*)? ')' #tableValuedFunction
: tableIdentifier sample? (AS? strictIdentifier)? #tableName
| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery
| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
| inlineTable #inlineTableDefault2
| identifier '(' (expression (',' expression)*)? ')' (AS? identifier identifierList?)? #tableValuedFunction
Copy link
Contributor

@hvanhovell hvanhovell Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should put the multi-alias in a separate rule? Since it is also used by inline table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add the multi-alias anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the point. okay, I'll reconsider this. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to add this. Lets keep the multi-alias for now.

Copy link
Member Author

@maropu maropu Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it seems I misunderstood what you pointed out. You meant should we need to support a query like SELECT * FROM [[tvf]] AS t(a, b, ...) in this pr? Yea, I know we currently support range only as a table value function though, I also think it'd be better to put a more general rule in this file. So, +1 for keeping this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I'll update this pr to separate this rule and share it with the inline table rule.

;

inlineTable
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,21 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
* A TVF maps argument lists to resolver functions that accept those arguments. Using a map
* here allows for function overloading.
*/
private type TVF = Map[ArgumentList, Seq[Any] => LogicalPlan]
private type TVF = Map[ArgumentList, (UnresolvedTableValuedFunction, Seq[Any]) => LogicalPlan]

/**
* TVF builder.
*/
private def tvf(args: (String, DataType)*)(pf: PartialFunction[Seq[Any], LogicalPlan])
: (ArgumentList, Seq[Any] => LogicalPlan) = {
private def tvf(args: (String, DataType)*)(
pf: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), LogicalPlan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... this signature is kind of complex. Can we try to use some kind of class/case class that encapsulates this?

Copy link
Contributor

@hvanhovell hvanhovell May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that we should separate the aliasing from constructing the table valued function. See @gatorsmile's earlier comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll re-consider the signature.

: (ArgumentList, (UnresolvedTableValuedFunction, Seq[Any]) => LogicalPlan) = {
val failAnalysis: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), LogicalPlan] = {
case (pf: UnresolvedTableValuedFunction, args: Seq[Any]) =>
throw new IllegalArgumentException(
"Invalid arguments for resolved function: " + args.mkString(", "))
}
(ArgumentList(args: _*),
pf orElse {
case args =>
throw new IllegalArgumentException(
"Invalid arguments for resolved function: " + args.mkString(", "))
})
(tvf: UnresolvedTableValuedFunction, args: Seq[Any]) => pf.orElse(failAnalysis)(tvf, args))
}

/**
Expand All @@ -80,37 +82,52 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
private val builtinFunctions: Map[String, TVF] = Map(
"range" -> Map(
/* range(end) */
tvf("end" -> LongType) { case Seq(end: Long) =>
Range(0, end, 1, None)
tvf("end" -> LongType) { case (tvf, args @ Seq(end: Long)) =>
validateInputDimension(tvf, 1)
Range(0, end, 1, None, tvf.outputNames.headOption)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing the output of Range, I think we can simply add a Project above Range for column alias, like what we do in the Dataframe API: spark.range(100).toDF("i")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},

/* range(start, end) */
tvf("start" -> LongType, "end" -> LongType) { case Seq(start: Long, end: Long) =>
Range(start, end, 1, None)
tvf("start" -> LongType, "end" -> LongType) {
case (tvf, args @ Seq(start: Long, end: Long)) =>
validateInputDimension(tvf, 1)
Range(start, end, 1, None, tvf.outputNames.headOption)
},

/* range(start, end, step) */
tvf("start" -> LongType, "end" -> LongType, "step" -> LongType) {
case Seq(start: Long, end: Long, step: Long) =>
Range(start, end, step, None)
case (tvf, args @ Seq(start: Long, end: Long, step: Long)) =>
validateInputDimension(tvf, 1)
Range(start, end, step, None, tvf.outputNames.headOption)
},

/* range(start, end, step, numPartitions) */
tvf("start" -> LongType, "end" -> LongType, "step" -> LongType,
"numPartitions" -> IntegerType) {
case Seq(start: Long, end: Long, step: Long, numPartitions: Int) =>
Range(start, end, step, Some(numPartitions))
case (tvf, args @ Seq(start: Long, end: Long, step: Long, numPartitions: Int)) =>
validateInputDimension(tvf, 1)
Range(start, end, step, Some(numPartitions), tvf.outputNames.headOption)
})
)

private def validateInputDimension(tvf: UnresolvedTableValuedFunction, expectedNumCols: Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, expectedNumCols needs to be part of type/class TVF. This is the num of TVF's output arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we need to add comments to this function.

: Unit = {
if (tvf.outputNames.nonEmpty) {
val numCols = tvf.outputNames.size
if (numCols != expectedNumCols) {
tvf.failAnalysis(s"expected $expectedNumCols columns but found $numCols columns")
}
}
}

override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
case u: UnresolvedTableValuedFunction if u.functionArgs.forall(_.resolved) =>
builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match {
case Some(tvf) =>
val resolved = tvf.flatMap { case (argList, resolver) =>
argList.implicitCast(u.functionArgs) match {
case Some(casted) =>
Some(resolver(casted.map(_.eval())))
Some(resolver(u, casted.map(_.eval())))
case _ =>
None
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,10 @@ case class UnresolvedInlineTable(
* select * from range(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this example.

* }}}
*/
case class UnresolvedTableValuedFunction(functionName: String, functionArgs: Seq[Expression])
case class UnresolvedTableValuedFunction(
functionName: String,
functionArgs: Seq[Expression],
outputNames: Seq[String])
extends LeafNode {

override def output: Seq[Attribute] = Nil
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -687,7 +687,21 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging {
*/
override def visitTableValuedFunction(ctx: TableValuedFunctionContext)
: LogicalPlan = withOrigin(ctx) {
UnresolvedTableValuedFunction(ctx.identifier.getText, ctx.expression.asScala.map(expression))
val (tvfName, aliasNameOption) = ctx.identifier.asScala.map(_.getText) match {
case Seq(funcName, aliasName) => (funcName, Some(aliasName))
case Seq(funcName) => (funcName, None)
}
val outputNames = if (ctx.identifierList != null) {
visitIdentifierList(ctx.identifierList)
} else {
Seq.empty
}
val plan = UnresolvedTableValuedFunction(
tvfName, ctx.expression.asScala.map(expression), outputNames)
aliasNameOption match {
case Some(aliasName) => SubqueryAlias(aliasName, plan)
case _ => plan
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -498,12 +498,16 @@ case class Sort(

/** Factory for constructing new `Range` nodes. */
object Range {
def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = {
val output = StructType(StructField("id", LongType, nullable = false) :: Nil).toAttributes

def apply(start: Long, end: Long, step: Long, numSlices: Option[Int], outputName: Option[String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this an option? It is not optional right? The next function should either call Range(start, end, step, Some(numSlices), "id") or this function should have a default parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, ok. I'll update. Thanks!

: Range = {
val name = outputName.getOrElse("id")
val output = StructType(StructField(name, LongType, nullable = false) :: Nil).toAttributes
new Range(start, end, step, numSlices, output)
}

def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = {
Range(start, end, step, Some(numSlices))
Range(start, end, step, Some(numSlices), None)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
import org.apache.spark.sql.catalyst.plans.Cross
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.types._
Expand Down Expand Up @@ -441,4 +440,15 @@ class AnalysisSuite extends AnalysisTest with ShouldMatchers {

checkAnalysis(SubqueryAlias("tbl", testRelation).as("tbl2"), testRelation)
}

test("SPARK-20311 range(N) as alias") {
def rangeWithAliases(outputNames: Seq[String]): LogicalPlan = {
SubqueryAlias("t", UnresolvedTableValuedFunction("range", Literal(7) :: Nil, outputNames))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test different cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll add more tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I a bit worry though, this fix satisfies your intention? 625dbda

.select(star())
}
assertAnalysisSuccess(rangeWithAliases("a" :: Nil))
assertAnalysisError(
rangeWithAliases("a" :: "b" :: Nil),
Seq("expected 1 columns but found 2 columns"))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,18 @@ class PlanParserSuite extends PlanTest {
test("table valued function") {
assertEqual(
"select * from range(2)",
UnresolvedTableValuedFunction("range", Literal(2) :: Nil).select(star()))
UnresolvedTableValuedFunction("range", Literal(2) :: Nil, Seq.empty).select(star()))
}

test("SPARK-20311 range(N) as alias") {
assertEqual(
"select * from range(10) AS t",
SubqueryAlias("t", UnresolvedTableValuedFunction("range", Literal(10) :: Nil, Seq.empty))
.select(star()))
assertEqual(
"select * from range(7) AS t(a)",
SubqueryAlias("t", UnresolvedTableValuedFunction("range", Literal(7) :: Nil, "a" :: Nil))
.select(star()))
}

test("inline table") {
Expand Down