Skip to content

Conversation

@szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented May 20, 2024

What changes were proposed in this pull request?

This PR aims to enable spark.sql.sources.v2.bucketing.pushPartValues.enabled by default for Apache Spark 4.0.0 while keeping spark.sql.sources.v2.bucketing.enabled is false.

Why are the changes needed?

spark.sql.sources.v2.bucketing.pushPartValues.enabled was added at Apache Spark 3.4.0 and has been used as one of the datasource v2 bucketing feature. This PR will help the datasource v2 bucketing users use this feature more easily.

Note that this change is technically no-op for the default users because spark.sql.sources.v2.bucketing.enabled is false still.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AS-IS PR title seems to be partial because this PR changes two configurations.

[SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true

Also, please resolve the conflicts and revert the test coverage change, @szehon-ho .

@dongjoon-hyun
Copy link
Member

cc @viirya , @sunchao

@szehon-ho
Copy link
Member Author

Thanks @dongjoon-hyun sorry the pr was not ready. I was trying to integrate the changes from @superdiaodiao who I asaw also made a pr for the same, so we can be co-authors. Reverted the additional config change and test case, will check the test result.

@superdiaodiao
Copy link
Contributor

Thanks @dongjoon-hyun sorry the pr was not ready. I was trying to integrate the changes from @superdiaodiao who I asaw also made a pr for the same, so we can be co-authors. Reverted the additional config change and test case, will check the test result.

I will close my PR and you can continue, let's co-author this time.

@szehon-ho
Copy link
Member Author

Thanks! @dongjoon-hyun @sunchao can you take another look?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [SPARK-48329][SQL] Enable spark.sql.sources.v2.bucketing.pushPartValues.enabled by default May 21, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @szehon-ho and @superdiaodiao

Merged to master for Apache Spark 4.0.0.

@superdiaodiao
Copy link
Contributor

+1, LGTM. Thank you, @szehon-ho and @superdiaodiao

Merged to master for Apache Spark 4.0.0.

Thank you~~~
We can't wait to see the release of 4.0.0.

@dongjoon-hyun
Copy link
Member

BTW, @szehon-ho and @superdiaodiao .

The umbrella JIRA is closed one already for Apache Spark 3.3.0. I moved SPARK-48329 to a subtask of SPARK-44111 for now.

Screenshot 2024-05-21 at 10 02 39

For the other new tasks, please create a new umbrella JIRA and use it.

@superdiaodiao
Copy link
Contributor

BTW, @szehon-ho and @superdiaodiao .

The umbrella JIRA is closed one already for Apache Spark 3.3.0. I moved SPARK-48329 to a subtask of SPARK-44111 for now.

Screenshot 2024-05-21 at 10 02 39

For the other new tasks, please create a new umbrella JIRA and use it.

OK

@szehon-ho
Copy link
Member Author

@dongjoon-hyun do you have any guidance how we may have a new Spark 4.0+ tracking JIRA to link all the new SPJ items? I think it will be nice to have all of them in one list.

@dongjoon-hyun
Copy link
Member

Feel free to create a new umbrella Jira issue with a proper meaningful title instead of duplicating old Jira issue title. Then, link it to SPARK-44111, @szehon-ho . That's enough.

HyukjinKwon pushed a commit that referenced this pull request Jun 12, 2024
### What changes were proposed in this pull request?
Add docs for SPJ

### Why are the changes needed?
There are no docs describing SPJ, even though it is mentioned in migration notes:  #46673

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Checked the new text

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46745 from szehon-ho/doc_spj.

Authored-by: Szehon Ho <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants