-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT #26340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT #26340
Conversation
|
Thanks @AngersZhuuuu . I will have time to come back to look into it more in a few days. I will look at @yaooqinn Kyuubi more as well. I wasn't aware of it's existence before! I think it could be great if improvements from there could be intergrated into mainline Spark. But I don't know how compatible it is with current thriftserver deployments? Kyuubi seems to not have a HTTP server right now, and also seems to be dependent on hive-1.2.1? |
I will make it pass build and test. |
|
@juliuszsompolski thanks for instesting in kyuubi project. The hive deps can be removed easily but still existing because of lack of time and not a big deal before Spark 3.0.0. The only interface for kyuubi to access metastore should be |
| @@ -0,0 +1,363 @@ | |||
| /** | |||
| * Autogenerated by Thrift Compiler (0.12.0) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code generated in Hive master seems to use 0.9.3. To avoid diffs, could you compile and use 0.9.3 from https://www.apache.org/dist/thrift/0.9.3/, and add to the readme to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code generated in Hive master seems to use 0.9.3. To avoid diffs, could you compile and use 0.9.3 from https://www.apache.org/dist/thrift/0.9.3/, and add to the readme to use it?
Since local is 0.12..... I will use 0.93 version to rebuild this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu Could we just add hive-service-rpc to our maven dependency?
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service-rpc</artifactId>
<version>2.3.6</version>
</dependency>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangyum We would be creating a dependency on a specific Hive version then again, because ThriftCLIService.java and ThriftCLIServiceClient.java and a few others depend on the exact version of the RPCs. Better to embed these.
sql/thriftserver/README.md
Outdated
| @@ -0,0 +1,3 @@ | |||
| Thrift commands to generate files from TCLIService.thrift: | |||
| -------------------- | |||
| thrift --gen java:beans,hashcode -o src/gen/thrift if/TCLIService.thrift | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add generated_annotations=undated to avoid diffs?
And add a sentence like Please use Thrift 0.9.3 available from https://www.apache.org/dist/thrift/0.9.3/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add
generated_annotations=undatedto avoid diffs?
And add a sentence likePlease use Thrift 0.9.3 available from https://www.apache.org/dist/thrift/0.9.3/.
Sorry for didn't notice this point. I'll update this later.
| </dependency> | ||
| <dependency> | ||
| <groupId>${hive.group}</groupId> | ||
| <artifactId>hive-service</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our thriftserver module is similar to hive-service. Could we adapt hive-beeline and hive-jdbc to this module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for doing that - to have our own client as a separate module, but not in the scope of this.
For now (in one of followup PR to this), we could start by pulling the client side into a new module (sql/jdbc-client?). I think thriftserver doesn't need to depend on hive-service and hive-beeline. These are needed only for a client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our
thriftservermodule is similar tohive-service. Could we adapthive-beelineandhive-jdbcto this module?
Can't agree more to do this. Will make spark more independent, and we can have our own jdbc schema as jdbc:spark such as impala and presto.
|
Does this PR re-implement the thrift server on its own to get rid of Hive dependency? Is this a new architecture like https://github.com/yaooqinn/kyuubi ? |
No, what we want to do:
|
How is this done? |
Currently only rely on hive's basic class such as HiveConf SessionState etc.. finally we will extract it out. |
|
can we do it incrementally? e.g each PR remove one file/class/method from v1.2.1/v2.3.5 and add necessary utils, and eventually make v1.2.1/v2.3.5 disappear. |
Hard to do like that, we have discussed a log about this, @juliuszsompolski @wangyum |
|
This PR is too hard to review. If it's necessary to spend a lot of effort to re-implement a thrift server, shall we consider merging https://github.com/yaooqinn/kyuubi into Spark? |
kyuubi is based on hive-1.2.1 so if we want to use, still need to fix. |
kyuubi only build with hive-1.2.1 to reuse its OperationLog logic, which can be replaced
Yes, it has been used on a production environment for more than two years,we had a lot of problems fixed. one user one sparkcontext in single jvm is the current usage, we may have a better idea to launch sparkcontext clusterly
The newest kyuubi have no function removed, only the http thrift server support |
|
@AngersZhuuuu do we have a doc to help review? i.e. which part is server and which part is client, how these classes interact with each other, etc. |
Ok, will make it more clear. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
As comment #26221 (comment)
Raise a pr with entire code and pass origin
hive-thriftserver's UTChanges:
Typein scala since spark con't support all type of hiveService/AbstractServiceprepare for remove hive conf in futureHiveAuthFactorysince between 1.2.1/2.3.5, their delegation token managerment changed. Impelment on DelegationTokenMnagerment by scalatableTypeStringfromSparkMetadataOperationUtilstoSparkMetadataOperationtableTypeStringinSparkMetadataOperationremoveClassicTypeMapping,HiveTableTypeMapping,TableTypeMappingandTableTypeMappingFactoryGetQueryId,SetClientInfofor thrift version v11 inThriftCLIServicestatementidtoOperationfor implementGetQueryIdGlobalHivercFileProcessorsetFetchSizeprocessGlobalInitFileetcopenSessionopenSessionWithImpersonationinCLIServicehive-thriftservertest tospark-thriftserverand removeThriftSeverShimUtilsabout test UT, since we won't need it.CLIServiceClientThrictCLIServiceClientWhy are the changes needed?
Solve hive version conflics and finally build a thriftserver make hive plugability.
Does this PR introduce any user-facing change?
start and stop
use beeline connect as origin.
How was this patch tested?
UT