Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -316,8 +316,10 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
protected def canonicalize(in: Seq[Expression]): Seq[Expression] =
in.map(ExpressionCanonicalizer.execute)

protected def bind(in: Seq[Expression], inputSchema: Seq[Attribute]): Seq[Expression] =
in.map(BindReferences.bindReference(_, inputSchema))
protected def bind(in: Seq[Expression], inputSchema: Seq[Attribute]): Seq[Expression] = {
lazy val inputSchemaAttrSeq: AttributeSeq = inputSchema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the lazy val? Are you optimizing for the case where in is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the reason. For example, the query df.count, where df is a dataframe from a CSV datasource, calls GenerateUnsafeProjection.bind with am empty list of expressions.

However, the map inside the AttributeSeq object is not built until someone accesses exprIdToOrdinal, so maybe it is overkill.

Copy link
Contributor

@mgaido91 mgaido91 Dec 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about changing the signature of bind instead? This would be helpful also to ensure that we don't miss this fix in other parts of the code IMHO.

Copy link
Contributor Author

@bersprockets bersprockets Dec 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgaido91 Do you mean change it to this?:

bind(in: Seq[Expression], inputSchema: AttributeSeq): Seq[Expression]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on something that eliminates this issue wholesale.

in.map(BindReferences.bindReference(_, inputSchemaAttrSeq))
}

def generate(
expressions: Seq[Expression],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,8 @@ package object expressions {
* A helper function to bind given expressions to an input schema.
*/
def toBoundExprs(exprs: Seq[Expression], inputSchema: Seq[Attribute]): Seq[Expression] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

exprs.map(BindReferences.bindReference(_, inputSchema))
lazy val inputSchemaAttrSeq: AttributeSeq = inputSchema
exprs.map(BindReferences.bindReference(_, inputSchemaAttrSeq))
}

/**
Expand Down