Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jun 13, 2024

What changes were proposed in this pull request?

Refer to the suggestion of #43563 (review), this pr add handling for Stream where LazyList.force is called.

Why are the changes needed?

Even though Stream is deprecated in 2.13, it is not removed and thus is is possible that some parts of Spark / Catalyst (or third-party code) might continue to pass around Stream instances. Hence, we should restore the call to Stream.force where .force is called on LazyList, to avoid losing the eager materialization for Streams that happen to flow to these call sites. This is also a guarantee of compatibility.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add some new tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jun 13, 2024
case d: DataType => d // Avoid unpacking Structs
case stream: LazyList[_] => stream.map(recursiveTransform).force
case stream: Stream[_] => stream.map(recursiveTransform).force
case lazyList: LazyList[_] => lazyList.map(recursiveTransform).force
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @JoshRosen Did I understand your suggestion correctly? Thanks ~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang, yes, this is exactly what I had in mind.

Copy link
Contributor

@JoshRosen JoshRosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes here look good to me (and are in line with what I suggested in my comment at #43563 (review) reporting the potential issue). Thanks for also updating the unit tests to ensure test coverage of the new branches.

LGTM pending filling out a PR description.

@LuciferYang LuciferYang marked this pull request as ready for review June 14, 2024 04:27
@LuciferYang LuciferYang changed the title [SPARK-45685][SQL] Add handling for Stream where LazyList.force is called [SPARK-45685][SQL][FOLLOWUP] Add handling for Stream where LazyList.force is called Jun 14, 2024
@LuciferYang
Copy link
Contributor Author

Merged into master for Spark 4.0. Thanks @JoshRosen ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants