-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan #46953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t representation of logical plan
|
cc. @cloud-fan @viirya Please take a look. Thanks! |
|
DISCLAIMER: @cloud-fan and I had a discussion about how to address the lack of information. This PR is based on the agreement. Thanks @cloud-fan for the valuable input! |
|
One side note, I see some logical nodes to override the method Probably even better if we could ensure the property of isStreaming value to be available for |
| // Ancestor class could mark something on the prefix, including 'invalid'. Add a marker for | ||
| // `streaming` only when there is no marker from ancestor class. | ||
| if (prefixFromSuper.isEmpty && isStreaming) { | ||
| "~" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we append it instead of using prefixFromSuper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR proposes to retain the prefix marker as single character (opposed to up to two characters). This would be OK in practice, since the moment the marker for isStreaming would be useful is to look into the plan which is already analyzed - that said, it’s unlikely that we need to see the both one of existing marker and the marker for streaming.
But we could reconsider if we have more voices supporting up to two chars for not overwriting. Maybe @cloud-fan ?
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like useful feature. It should be also good to explain/understand the query plan to users.
|
@viirya Thanks for the reviewing and quick approval! What about your thought of side note? |
Yea, it sounds good direction to go. If the added |
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR proposes to introduce a marker for isStreaming property in text representation of logical plan.
The marker will be
~, along with!(invalid) and'(unresolved).This PR proposes to retain the prefix marker as single character (opposed to up to two characters). This would be OK in practice, since the moment the marker for isStreaming would be useful is to look into the plan which is already analyzed - that said, it’s unlikely that we need to see the both one of existing marker and the marker for streaming.
Why are the changes needed?
This would help tracking down QO issues happening with streaming query much easier. For example, here is the example of the rule which triggered SPARK-47305:
The bug of SPARK-47305 was, LocalRelation in above was "incorrectly" marked as
streaming=truewhere it should bestreaming=false. There is no notion of isStreaming flag in the text representation of LocalRelation, hence from the text plan we would never know the rule had a bug. Even though we assume we show the value of isStreaming in LocalRelation, the depth of subtree could be huge in practice and it's not friendly to go down to the leaf node to figure out the isStreaming value of the entire subtree.After this PR, the above rule information will be changed as below:
Now it's obvious that isStreaming flag of leaf node had changed. Also, to check the isStreaming flag of children for Join, we just need to look at the first node of subtree for children, instead of going down to leaf nodes.
Does this PR introduce any user-facing change?
Yes, since the textual representation of logical plan will be changed a bit. But it's only applied to the streaming Dataset, and also the textual representation of logical plan is arguably not a public API. (Keeping backward compatibility of the text is technically very hard.)
How was this patch tested?
Existing UTs for regression test on batch and streaming query. For streaming query, this PR updated the golden file to match with the change.
Was this patch authored or co-authored using generative AI tooling?
No.