Skip to content

Conversation

@hvanhovell
Copy link
Contributor

@hvanhovell hvanhovell commented Jul 27, 2023

What changes were proposed in this pull request?

This PR decouples the Spark Connect Scala Client from Catalyst, it now used SQL API module instead.

There were quite a few changes we still needed to make:

  • For testing we needed a bunch of utilities. I have moved these to common-utils.
  • I have moved bits and pieces of IntervalUtils to SparkIntervalUtils.
  • A lot of small fixes.

Why are the changes needed?

This reduces the client's dependency tree from ~300 MB of deps to ~30MB. This makes it easier to use the client when you are developing connect applications. On top of this the reduced dependency graph also means folks will be less affected by the clients' classpath.

Does this PR introduce any user-facing change?

Yes. It changes the classpath exposed by the Spark Connect Scala Client.

How was this patch tested?

Existing tests.

@hvanhovell
Copy link
Contributor Author

This need #42164 to go in first.

@github-actions github-actions bot removed the DSTREAM label Jul 27, 2023
@github-actions github-actions bot removed the AVRO label Jul 28, 2023
@hvanhovell hvanhovell marked this pull request as ready for review July 28, 2023 15:28
@hvanhovell
Copy link
Contributor Author

cc @amaliujia @cloud-fan @zhenlineo

@amaliujia
Copy link
Contributor

It is happening!

@hvanhovell
Copy link
Contributor Author

Merging this. Single test failure, is a bit flaky.

hvanhovell added a commit that referenced this pull request Jul 29, 2023
### What changes were proposed in this pull request?
This PR decouples the Spark Connect Scala Client from Catalyst, it now used SQL API module instead.

There were quite a few changes we still needed to make:
- For testing we needed a bunch of utilities. I have moved these to common-utils.
- I have moved bits and pieces of IntervalUtils to SparkIntervalUtils.
- A lot of small fixes.

### Why are the changes needed?
This reduces the client's dependency tree from ~300 MB of deps to ~30MB. This makes it easier to use the client when you are developing connect applications. On top of this the reduced dependency graph also means folks will be less affected by the clients' classpath.

### Does this PR introduce _any_ user-facing change?
Yes. It changes the classpath exposed by the Spark Connect Scala Client.

### How was this patch tested?
Existing tests.

Closes #42184 from hvanhovell/SPARK-41400-v1.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit 85a4d1e)
Signed-off-by: Herman van Hovell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants