Skip to content

Conversation

@mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Apr 24, 2025

What changes were proposed in this pull request?

This is a followup to #50606, to address remaining issues in fixed-point analyzer and parser. This PR does the following:

  1. Set correct origin in SparkConnect and Dataframe API
  2. Handle remaining cases for order by ordinal in Dataframe API
  3. Add a test suite for Dataframe use cases since we are missing those

Why are the changes needed?

To address current issue with UnresolvedOrdinal

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added a new test suite and existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_ordinal_followup branch 5 times, most recently from 206a6fc to bad8633 Compare April 25, 2025 12:55
@mihailotim-db mihailotim-db changed the title fix [SPARK-51820][SQL][CONNECT] Address remaining issues for new group/order by ordinal approach Apr 25, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_ordinal_followup branch from bad8633 to 6aee218 Compare April 25, 2025 13:08
@mihailoale-db
Copy link
Contributor

IIUC we changed the API/approach for functions that explicitly add Sort/Aggregate, but there are other functions/rules that do that implicitly (e.g randomSplit)? @cloud-fan are there any other places where we add those operators that should be handled?

@cloud-fan
Copy link
Contributor

We should check all DataFrame APIs, but randomSplit is not a target as the order by expression is not specified by users.

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_ordinal_followup branch 3 times, most recently from b4d1ed0 to 1003a53 Compare April 28, 2025 06:58
@mihailoale-db
Copy link
Contributor

@cloud-fan Agree that randomSplit is not an issue but for example you could have a FunctionTableSubqueryArgumentExpression which produces a Sort. My concern is that we have a few other operators/expressions/rules that introduce Sort in the middle of the analysis which are not going to be handled properly (same as ones which we introduce by the API calls).

@mihailotim-db
Copy link
Contributor Author

@mihailoale-db We are only interested in what happens in early batches (before SubstituteUnresolvedOrdinals was ran) and from what I can see, there are no Sorts anywhere there

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_ordinal_followup branch from 1003a53 to 1d4925f Compare April 28, 2025 09:55
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/fix_ordinal_followup branch from 1d4925f to 927bdc8 Compare April 28, 2025 10:03
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4e2fd07 Apr 29, 2025
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
…`order` by ordinal approach

### What changes were proposed in this pull request?
This is a followup to apache#50606, to address remaining issues in fixed-point analyzer and parser. This PR does the following:

1. Set correct origin in `SparkConnect` and `Dataframe` API
2. Handle remaining cases for order by ordinal in `Dataframe` API
3. Add a test suite for Dataframe use cases since we are missing those

### Why are the changes needed?
To address current issue with `UnresolvedOrdinal`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a new test suite and existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50699 from mihailotim-db/mihailotim-db/fix_ordinal_followup.

Authored-by: Mihailo Timotic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants