Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Jul 27, 2015

This pull request updates GenerateUnsafeProjection to support StructType. If an input struct type is backed already by an UnsafeRow, GenerateUnsafeProjection copies the bytes directly into its buffer space without any conversion. However, if the input is not an UnsafeRow, GenerateUnsafeProjection runs the code generated recursively to convert the input into an UnsafeRow and then copies it into the buffer space.

Also create a TungstenProject operator that projects data directly into UnsafeRow. Note that I'm not sure if this is the way we want to structure Unsafe+codegen operators, but we can defer that decision to follow-up pull requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot of code duplication between this one and createCode.

@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #1206 has finished for PR 7689 at commit 0cae3e8.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin rxin force-pushed the tungsten-struct-type branch from 0cae3e8 to a9600e6 Compare July 27, 2015 19:32
@rxin rxin changed the title [SPARK-9373][SQL] Support StructType in Tungsten projection [WIP] [SPARK-9373][SQL] Support StructType in Tungsten projection Jul 27, 2015
@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #38578 has finished for PR 7689 at commit f1bee3a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static class StructWriter
    • public static class IntervalWriter
    • case class TungstenProject(projectList: Seq[NamedExpression], child: SparkPlan) extends UnaryNode

@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #38590 has finished for PR 7689 at commit da6aed9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static class StructWriter
    • case class CreateStructUnsafe(children: Seq[Expression]) extends Expression
    • case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression
    • case class TungstenProject(projectList: Seq[NamedExpression], child: SparkPlan) extends UnaryNode

@rxin
Copy link
Contributor Author

rxin commented Jul 27, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38599 has finished for PR 7689 at commit f71b659.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static class StructWriter
    • case class CreateStructUnsafe(children: Seq[Expression]) extends Expression
    • case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression
    • case class TungstenProject(projectList: Seq[NamedExpression], child: SparkPlan) extends UnaryNode

@rxin rxin force-pushed the tungsten-struct-type branch from f71b659 to be9f377 Compare July 28, 2015 01:00
@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38598 timed out for PR 7689 at commit ce94678 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38616 has finished for PR 7689 at commit be9f377.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RFormula(override val uid: String) extends Estimator[RFormulaModel] with RFormulaBase
    • public static class StructWriter
    • case class CreateStructUnsafe(children: Seq[Expression]) extends Expression
    • case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression
    • case class TungstenProject(projectList: Seq[NamedExpression], child: SparkPlan) extends UnaryNode

@JoshRosen
Copy link
Contributor

Looks like UnsafeRow.get doesn't support IntervalType:

 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1153.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1153.0 (TID 4994, localhost): java.lang.UnsupportedOperationException: Unsupported data type interval
[info]      at org.apache.spark.sql.catalyst.expressions.UnsafeRow.get(UnsafeRow.java:271)
[info]      at org.apache.spark.sql.catalyst.CatalystTypeConverters$IdentityConverter.toScalaImpl(CatalystTypeConverters.scala:144)
[info]      at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toScala(CatalystTypeConverters.scala:118)
[info]      at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:268)
[info]      at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:235)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: string -> struct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't obvious at first glance... since these expressions are returning names won't it be safe to just assume that they return strings and to use a .asInstanceOf[String] rather than calling toString here? Just a little wary of using .toString() during processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I see that this is just copied from CreateNamedStruct above, so it should be fine.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, can you also add a check to ensure that codegen is enabled?

@JoshRosen
Copy link
Contributor

At a high level this seems fine to me, but I haven't looked in super-close detail yet. I'm fine with merging this pending tests and continuing to iterate in followup patches.

@rxin
Copy link
Contributor Author

rxin commented Jul 28, 2015

Thanks - going to merge this and submit a pr to fix the remaining issues tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants