Skip to content

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

SparkConnectStreamHandler treats the proto requests from connect client and send the responses back to connect client. SparkConnectStreamHandler holds a component StreamObserver to send responses.
So I think we should keep the StreamObserver could be accessed only with SparkConnectStreamHandler.

This PR want decouple the process handle commands and the other process send responses on server side.
After this PR, we can remove MockObserver which seems is not useful.

Why are the changes needed?

Decouple handle command and send response on server side.

Does this PR introduce any user-facing change?

'No'.
Just update the inner implementation.

How was this patch tested?

Exists test cases.

@beliefer
Copy link
Contributor Author

@zhengruifeng
Copy link
Contributor

zhengruifeng commented May 30, 2023

In case of multi ExecutePlanResponses, does this change mean we can not send the first response before the last response is ready?

@beliefer
Copy link
Contributor Author

In case of multi ExecutePlanResponses, does this change mean we can not send the first response before the last response is ready?

Yeah! This is an issue. I will improve it.

@beliefer
Copy link
Contributor Author

beliefer commented Jun 2, 2023

@zhengruifeng Currently, except for handleSqlCommand, all of them only return one response. The handleSqlCommand returns two responses, the first being the result of executing the SQL command, and the second being the Metrics information. The time required to create Metrics information is minimal.

@zhengruifeng
Copy link
Contributor

cc @grundprinzip @hvanhovell what do you think of this PR?

@grundprinzip
Copy link
Contributor

Unfortunately, I think this might not be the best possible approach. Any command has the ability to send any response "right now" and expects that it will be returned. The client can change their behavior based on this and we should not simply buffer all responses. In addition, if I understand correctly, buffering the responses is not great for memory consumption reasons.

In the asynchronous query processing case, we want to immediately return a message for example that returns a query ID for the client to re-attache to.

@beliefer
Copy link
Contributor Author

beliefer commented Jun 5, 2023

In the asynchronous query processing case, we want to immediately return a message for example that returns a query ID for the client to re-attache to.

I agree your opinion. In fact, all the has implemented command will not occur the issues you commented. So, do we really have the scene you comment above ?
Note: currently, except for handleSqlCommand, all of them only return one response.

@zhengruifeng
Copy link
Contributor

even if there is no other commands returning multiple response, I don't feel strongly about the idea of introducing such a limitation. I also think we should send them asap.

@grundprinzip
Copy link
Contributor

👎 on this PR in the current state as the protocol is using a streaming response. That today this is not directly used is not a good enough answer. Users can stream responses back using custom commands from the extension registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants