Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Oct 22, 2022

What changes were proposed in this pull request?

Reimplement crosstab with dataframe operations

Why are the changes needed?

1, do not truncate the sql plan;
2, much more scalable;
3, existing implementation (added in v1.5.0) collect distinct col1, col2 pairs to driver, while pivot (added in v2.4.0) only collect distinct col2 which is much smaller;

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs and manually check

init

init

init
@github-actions github-actions bot added the SQL label Oct 22, 2022
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the sql_stat_crosstab branch October 24, 2022 01:53
@zhengruifeng
Copy link
Contributor Author

thank you @HyukjinKwon

zhengruifeng added a commit that referenced this pull request Nov 10, 2022
…tab `

### What changes were proposed in this pull request?
remove the outdated comments

### Why are the changes needed?
the limitations are not true after [reimplementation](#38340)

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
doc - only

Closes #38579 from zhengruifeng/doc_crosstab.

Lead-authored-by: Ruifeng Zheng <[email protected]>
Co-authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?
Reimplement `crosstab` with dataframe operations

### Why are the changes needed?
1, do not truncate the sql plan;
2, much more scalable;
3, existing implementation (added in v1.5.0) collect distinct `col1, col2` pairs to driver, while `pivot` (added in v2.4.0)  only collect distinct `col2` which is much smaller;

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UTs and manually check

Closes apache#38340 from zhengruifeng/sql_stat_crosstab.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…tab `

### What changes were proposed in this pull request?
remove the outdated comments

### Why are the changes needed?
the limitations are not true after [reimplementation](apache#38340)

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
doc - only

Closes apache#38579 from zhengruifeng/doc_crosstab.

Lead-authored-by: Ruifeng Zheng <[email protected]>
Co-authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants