Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Dec 27, 2018

What changes were proposed in this pull request?

This is a followup of #18576

The newly added rule UpdateNullabilityInAttributeReferences does the same thing the FixNullability does, we only need to keep one of them.

This PR removes UpdateNullabilityInAttributeReferences, and use FixNullability to replace it. Also rename it to UpdateAttributeNullability

How was this patch tested?

existing tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this rule out of the Analyzer, so that it can be used in other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

@cloud-fan cloud-fan Dec 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the UpdateNullabilityInAttributeReferences. Leaf nodes don't have child and no nullability will be updated, then the case below is noop.

@cloud-fan
Copy link
Contributor Author

cc @maropu @gatorsmile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the impl of resolveOperatorsUp, if the plan is analyzed, this rule will not take any effect in the optimizer stage. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the plan is analyzed

More precisely, the _analyzed flag is true.

This flag will be reset to false if the plan changed(plan copy happened). If it's true, then the plan is not changed since last analysis and we don't need to update the nullability.

@SparkQA
Copy link

SparkQA commented Dec 27, 2018

Test build #100478 has finished for PR 23390 at commit b2398c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Analyzer.scala is too big, let's make this as a new file, please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about UpdateNullability -> UpdateAttributeNullability?

@dongjoon-hyun
Copy link
Member

If you don't mind, shall we have a new JIRA issue for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: plz update the comment inside this test: UpdateNullabilityInAttributeReferences -> UpdateNullability

@cloud-fan cloud-fan changed the title [SPARK-21351][SQL][followup] reuse the FixNullability rule [SPARK-26459][SQL] reuse the FixNullability rule Dec 28, 2018
@maropu
Copy link
Member

maropu commented Dec 28, 2018

LGTM

@dongjoon-hyun
Copy link
Member

Could you adjust the PR title and description by mentioning UpdateAttributeNullability because FixNullability is completely removed due to renaming from now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this check even after the resolution batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we don't. But it has no harm, and help us to merge these 2 rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still looks weird to me if we call an analyzer rule in the optimizer. Our codegen impl depends on the correctness of nullability fields. I am wondering which rule could break it? join reordering?

I think our existing test cases might already have such a case. Could you throw an exception if this rule changes the nullability in the optimizer stage? I want to know the exact case why we need to run this in the optimizer stage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, if we introduce fixed-size array type, then CreateArray returns fixed-size array, and GetArrayItem can define the nullable smarter if the input is fixed-size array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu do you have more use cases? If it's the only use case, maybe we can simply remove this optimization as its use case is rare. And we can optimize it in a better way in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to understand which cases are improved and then update the nullable at the right place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Removing it from the optimizer looks ok to me, but I remember the rule seems to be related to the existing tests? See: #18576 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we accept this patch and think about removing this optimization later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, that sounds good to me. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When removing it in a following pr, could you reopen the jira, too? https://issues.apache.org/jira/browse/SPARK-21351

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, feel free to merge this PR if you think it's ready to go. thanks!

@SparkQA
Copy link

SparkQA commented Dec 28, 2018

Test build #100486 has finished for PR 23390 at commit 775236a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 28, 2018

Test build #100487 has finished for PR 23390 at commit 28284e9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just have a map exprId -> nullable here and use this map is the transformExpressions below?

@SparkQA
Copy link

SparkQA commented Dec 28, 2018

Test build #100492 has finished for PR 23390 at commit 28284e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-26459][SQL] reuse the FixNullability rule [SPARK-26459][SQL] replace UpdateAttributeNullability with FixNullability and rename it to UpdateAttributeNullability Dec 28, 2018
@cloud-fan cloud-fan changed the title [SPARK-26459][SQL] replace UpdateAttributeNullability with FixNullability and rename it to UpdateAttributeNullability [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability and rename it to UpdateAttributeNullability Dec 28, 2018
@cloud-fan cloud-fan changed the title [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability and rename it to UpdateAttributeNullability [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability Dec 28, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds weird that in optimization phase we make correct nullability to wrong again. Can we have other way to solve it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not about wrong nullability, it's an optimization. see https://github.com/apache/spark/pull/23390/files#r244326583

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. Then sounds like removing this optimization (https://github.com/apache/spark/pull/23390/files#r244326906) is ok.

@SparkQA
Copy link

SparkQA commented Jan 4, 2019

Test build #100732 has finished for PR 23390 at commit 5cf9850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 9, 2019

retest this please

@maropu
Copy link
Member

maropu commented Jan 9, 2019

Pending Jenkins

@SparkQA
Copy link

SparkQA commented Jan 9, 2019

Test build #100951 has finished for PR 23390 at commit 5cf9850.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 9, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jan 9, 2019

Test build #100957 has finished for PR 23390 at commit 5cf9850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 9, 2019

@cloud-fan sorry, but could you resolve the conflict?

@maropu
Copy link
Member

maropu commented Jan 10, 2019

Thanks, pending Jenkins

@SparkQA
Copy link

SparkQA commented Jan 10, 2019

Test build #100999 has finished for PR 23390 at commit bcb5667.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 6955638 Jan 10, 2019
@maropu
Copy link
Member

maropu commented Jan 10, 2019

Thanks! merging to master.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…h FixNullability

## What changes were proposed in this pull request?

This is a followup of apache#18576

The newly added rule `UpdateNullabilityInAttributeReferences` does the same thing the `FixNullability` does, we only need to keep one of them.

This PR removes `UpdateNullabilityInAttributeReferences`, and use `FixNullability` to replace it. Also rename it to `UpdateAttributeNullability`

## How was this patch tested?

existing tests

Closes apache#23390 from cloud-fan/nullable.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
cloud-fan pushed a commit that referenced this pull request Mar 11, 2019
…e optimizer

## What changes were proposed in this pull request?
This pr removed `UpdateAttributeNullability` from the optimizer because the same logic happens in the analyzer. See SPARK-26459(#23390) for more detailed discussion.

## How was this patch tested?
N/A

Closes #23508 from maropu/SPARK-21351.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants