Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Mar 12, 2016

JIRA: https://issues.apache.org/jira/browse/SPARK-13838

What changes were proposed in this pull request?

We should also clear the variable code in BoundReference.genCode to prevent it to be evaluated twice, as we did in evaluateVariables.

How was this patch tested?

Existing tests.

@viirya
Copy link
Member Author

viirya commented Mar 12, 2016

cc @davies

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52995 has finished for PR 11674 at commit 0068ff8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-XXX][SQL] Clear variable code to prevent it to be re-evaluated in BoundAttribute [SPARK-13838][SQL] Clear variable code to prevent it to be re-evaluated in BoundAttribute Mar 13, 2016
@viirya
Copy link
Member Author

viirya commented Mar 17, 2016

cc @davies This is tiny. Do you think this is useful?

@davies
Copy link
Contributor

davies commented Mar 17, 2016

LGTM, merging into master.

@asfgit asfgit closed this in 5f3bda6 Mar 17, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…ted in BoundAttribute

JIRA: https://issues.apache.org/jira/browse/SPARK-13838
## What changes were proposed in this pull request?

We should also clear the variable code in `BoundReference.genCode` to prevent it  to be evaluated twice, as we did in `evaluateVariables`.

## How was this patch tested?

Existing tests.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#11674 from viirya/avoid-reevaluate.
@cloud-fan
Copy link
Contributor

who will re-evaluate ctx.currentVars?

@viirya
Copy link
Member Author

viirya commented Nov 16, 2017

If one variable is used as input to many expressions?

@cloud-fan
Copy link
Contributor

for example, spark.range(10).select('id + 1 as 'i).filter('i + 'i < 4). When the filter opetator consumes input, it already pre-evalute i, and 'id + 1 is only evaluated once, IIUC.

@viirya
Copy link
Member Author

viirya commented Nov 17, 2017

It is correct that we should always evaluate the used variables before generating expression codes. The variables' codes are clear and won't be evaluated twice.

Here this is a safety guard that prevents possible missing, I think.

@cloud-fan
Copy link
Contributor

It's different from evaluateRequiredVariables, evaluateRequiredVariables pulls out the code to be evaluated and put it in the beginning of the generated code. However here we just clear the code, which looks unsafe. If we can't come up with a real case, shall we revert this?

@viirya
Copy link
Member Author

viirya commented Nov 17, 2017

Ok. I think it should be safe to revert this.

@cloud-fan
Copy link
Contributor

Thanks! Since it's a small change, we can do it in future whole-stage-codegen-related PRs.

@viirya
Copy link
Member Author

viirya commented Nov 17, 2017

Oh. I see. After looking at the source file at that time:

https://github.com/viirya/spark-1/blob/0068ff81bf9e90194ba9dc5631ac85683b9606f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala#L68-L70

The genCode method directly returns the code of the used currentVars. I guess the returned code might be pasted into generated code previously. So it is better to clear the code of the variable.

After iterations of revamping, this is not anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants