[SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability #23390

cloud-fan · 2018-12-27T14:25:45Z

What changes were proposed in this pull request?

This is a followup of #18576

The newly added rule UpdateNullabilityInAttributeReferences does the same thing the FixNullability does, we only need to keep one of them.

This PR removes UpdateNullabilityInAttributeReferences, and use FixNullability to replace it. Also rename it to UpdateAttributeNullability

How was this patch tested?

existing tests

cloud-fan · 2018-12-27T14:26:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

move this rule out of the Analyzer, so that it can be used in other places.

cloud-fan · 2018-12-27T14:28:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

This is from the UpdateNullabilityInAttributeReferences. Leaf nodes don't have child and no nullability will be updated, then the case below is noop.

cloud-fan · 2018-12-27T14:29:32Z

cc @maropu @gatorsmile

gatorsmile · 2018-12-27T18:27:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Based on the impl of resolveOperatorsUp, if the plan is analyzed, this rule will not take any effect in the optimizer stage. Right?

if the plan is analyzed

More precisely, the _analyzed flag is true.

This flag will be reset to false if the plan changed(plan copy happened). If it's true, then the plan is not changed since last analysis and we don't need to update the nullability.

SparkQA · 2018-12-27T18:48:25Z

Test build #100478 has finished for PR 23390 at commit b2398c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-12-28T00:00:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Since Analyzer.scala is too big, let's make this as a new file, please.

How about UpdateNullability -> UpdateAttributeNullability?

dongjoon-hyun · 2018-12-28T00:03:23Z

If you don't mind, shall we have a new JIRA issue for this?

maropu · 2018-12-28T03:49:05Z

...c/test/scala/org/apache/spark/sql/catalyst/optimizer/UpdateNullabilityInOptimizerSuite.scala

nit: plz update the comment inside this test: UpdateNullabilityInAttributeReferences -> UpdateNullability

maropu · 2018-12-28T04:48:46Z

LGTM

dongjoon-hyun · 2018-12-28T05:35:29Z

Could you adjust the PR title and description by mentioning UpdateAttributeNullability because FixNullability is completely removed due to renaming from now?

maropu · 2018-12-28T05:40:25Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UpdateAttributeNullability.scala

We need this check even after the resolution batch?

no we don't. But it has no harm, and help us to merge these 2 rules.

gatorsmile · 2018-12-28T05:54:39Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UpdateAttributeNullability.scala

It still looks weird to me if we call an analyzer rule in the optimizer. Our codegen impl depends on the correctness of nullability fields. I am wondering which rule could break it? join reordering?

I think our existing test cases might already have such a case. Could you throw an exception if this rule changes the nullability in the optimizer stage? I want to know the exact case why we need to run this in the optimizer stage.

The original PR #18576 explains it, and the test case is https://github.com/apache/spark/pull/23390/files#diff-099c363f75cfc9011d9e08f5a8067038R29

In the future, if we introduce fixed-size array type, then CreateArray returns fixed-size array, and GetArrayItem can define the nullable smarter if the input is fixed-size array.

@maropu do you have more use cases? If it's the only use case, maybe we can simply remove this optimization as its use case is rare. And we can optimize it in a better way in the future.

Yes, we need to understand which cases are improved and then update the nullable at the right place.

@cloud-fan Removing it from the optimizer looks ok to me, but I remember the rule seems to be related to the existing tests? See: #18576 (comment)

How about we accept this patch and think about removing this optimization later?

yea, that sounds good to me. Thanks!

When removing it in a following pr, could you reopen the jira, too? https://issues.apache.org/jira/browse/SPARK-21351

sure, feel free to merge this PR if you think it's ready to go. thanks!

SparkQA · 2018-12-28T06:04:44Z

Test build #100486 has finished for PR 23390 at commit 775236a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-28T08:05:02Z

Test build #100487 has finished for PR 23390 at commit 28284e9.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-12-28T08:09:30Z

Retest this please

mgaido91 · 2018-12-28T11:14:16Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UpdateAttributeNullability.scala

can't we just have a map exprId -> nullable here and use this map is the transformExpressions below?

SparkQA · 2018-12-28T12:09:33Z

Test build #100492 has finished for PR 23390 at commit 28284e9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-01-03T14:25:41Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UpdateAttributeNullability.scala

It sounds weird that in optimization phase we make correct nullability to wrong again. Can we have other way to solve it?

it's not about wrong nullability, it's an optimization. see https://github.com/apache/spark/pull/23390/files#r244326583

ah, I see. Then sounds like removing this optimization (https://github.com/apache/spark/pull/23390/files#r244326906) is ok.

SparkQA · 2019-01-04T17:15:23Z

Test build #100732 has finished for PR 23390 at commit 5cf9850.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-01-09T04:41:03Z

retest this please

maropu · 2019-01-09T04:42:24Z

Pending Jenkins

SparkQA · 2019-01-09T07:53:17Z

Test build #100951 has finished for PR 23390 at commit 5cf9850.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-01-09T09:58:13Z

retest this please

SparkQA · 2019-01-09T14:12:07Z

Test build #100957 has finished for PR 23390 at commit 5cf9850.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-01-09T23:01:07Z

@cloud-fan sorry, but could you resolve the conflict?

maropu · 2019-01-10T02:50:34Z

Thanks, pending Jenkins

SparkQA · 2019-01-10T06:48:35Z

Test build #100999 has finished for PR 23390 at commit bcb5667.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-01-10T11:22:26Z

Thanks! merging to master.

…h FixNullability ## What changes were proposed in this pull request? This is a followup of apache#18576 The newly added rule `UpdateNullabilityInAttributeReferences` does the same thing the `FixNullability` does, we only need to keep one of them. This PR removes `UpdateNullabilityInAttributeReferences`, and use `FixNullability` to replace it. Also rename it to `UpdateAttributeNullability` ## How was this patch tested? existing tests Closes apache#23390 from cloud-fan/nullable. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

…e optimizer ## What changes were proposed in this pull request? This pr removed `UpdateAttributeNullability` from the optimizer because the same logic happens in the analyzer. See SPARK-26459(#23390) for more detailed discussion. ## How was this patch tested? N/A Closes #23508 from maropu/SPARK-21351. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan commented Dec 27, 2018

View reviewed changes

gatorsmile reviewed Dec 27, 2018

View reviewed changes

dongjoon-hyun reviewed Dec 28, 2018

View reviewed changes

maropu reviewed Dec 28, 2018

View reviewed changes

cloud-fan changed the title ~~[SPARK-21351][SQL][followup] reuse the FixNullability rule~~ [SPARK-26459][SQL] reuse the FixNullability rule Dec 28, 2018

maropu reviewed Dec 28, 2018

View reviewed changes

gatorsmile reviewed Dec 28, 2018

View reviewed changes

mgaido91 reviewed Dec 28, 2018

View reviewed changes

cloud-fan changed the title ~~[SPARK-26459][SQL] reuse the FixNullability rule~~ [SPARK-26459][SQL] replace UpdateAttributeNullability with FixNullability and rename it to UpdateAttributeNullability Dec 28, 2018

cloud-fan changed the title ~~[SPARK-26459][SQL] replace UpdateAttributeNullability with FixNullability and rename it to UpdateAttributeNullability~~ [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability and rename it to UpdateAttributeNullability Dec 28, 2018

cloud-fan changed the title ~~[SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability and rename it to UpdateAttributeNullability~~ [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability Dec 28, 2018

viirya reviewed Jan 3, 2019

View reviewed changes

reuse the FixNullability rule

feb57c0

cloud-fan added 3 commits January 10, 2019 10:42

address comments

e328ba3

fix comment

5930cd7

address comment

bcb5667

cloud-fan force-pushed the nullable branch from 5cf9850 to bcb5667 Compare January 10, 2019 02:43

asfgit closed this in 6955638 Jan 10, 2019

maropu mentioned this pull request Jan 10, 2019

[SPARK-21351][SQL] Remove the UpdateAttributeNullability rule from the optimizer #23508

Closed

JoshRosen mentioned this pull request Jul 15, 2019

[SPARK-27915][SQL][WIP] Update logical Filter's output nullability based on IsNotNull conditions #24765

Closed

[SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability #23390

[SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability #23390

Uh oh!

Conversation

cloud-fan commented Dec 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Dec 28, 2018

Uh oh!

dongjoon-hyun commented Dec 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 28, 2018

Uh oh!

SparkQA commented Dec 28, 2018

Uh oh!

dongjoon-hyun commented Dec 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 4, 2019

Uh oh!

maropu commented Jan 9, 2019

cloud-fan commented Dec 27, 2018 •

edited

Loading

cloud-fan Dec 27, 2018 •

edited

Loading