Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

As comment #26221 (comment)
Raise a pr with entire code and pass origin hive-thriftserver 's UT

Changes:

  1. Impelemnt Type in scala since spark con't support all type of hive
  2. Implement Service/AbstractService prepare for remove hive conf in future
  3. Construct RowSet with StructType and Row
  4. Implement HiveAuthFactory since between 1.2.1/2.3.5, their delegation token managerment changed. Impelment on DelegationTokenMnagerment by scala
  5. MV tableTypeString from SparkMetadataOperationUtils to SparkMetadataOperation
  6. Since there are tableTypeString in SparkMetadataOperation remove ClassicTypeMapping, HiveTableTypeMapping, TableTypeMapping and TableTypeMappingFactory
  7. Implement all oepration for spark since it execute in different way
  8. Add new method GetQueryId, SetClientInfo for thrift version v11 in ThriftCLIService
  9. Add statementid to Operation for implement GetQueryId
  10. Remove GlobalHivercFileProcessor setFetchSize processGlobalInitFileetc
  11. Remove unused openSession openSessionWithImpersonation in CLIService
  12. Copy hive-thriftserver test to spark-thriftserver and remove ThriftSeverShimUtils about test UT, since we won't need it.
  13. Implement it's own thrift client code. CLIServiceClient ThrictCLIServiceClient

Why are the changes needed?

Solve hive version conflics and finally build a thriftserver make hive plugability.

Does this PR introduce any user-facing change?

start and stop

./sbin/start-spark-thriftserver.sh
./sbin/stop-spark-thriftserver.sh

use beeline connect as origin.

How was this patch tested?

UT

@juliuszsompolski
Copy link
Contributor

Thanks @AngersZhuuuu . I will have time to come back to look into it more in a few days.
In the meanwhile, could you get the PR to compile and run tests? I think that for that you have to commit the thrift generated sources after all. We could keep the maven build step as an optional manual step that can be executed when somebody wants to regenerate it, and add instructions for it to sql/thriftserver/README.md.
Can the maven plugin enforce the thrift compiler version? We could add instructions to README to use 0.9.3, and get and install it from https://www.apache.org/dist/thrift/0.9.3/. Otherwise, I found that even small thrift version differences lead to tiny annoying differences in all the generated files that make a huge diff when I regenerate it.

I will look at @yaooqinn Kyuubi more as well. I wasn't aware of it's existence before! I think it could be great if improvements from there could be intergrated into mainline Spark. But I don't know how compatible it is with current thriftserver deployments? Kyuubi seems to not have a HTTP server right now, and also seems to be dependent on hive-1.2.1?

@AngersZhuuuu
Copy link
Contributor Author

Thanks @AngersZhuuuu . I will have time to come back to look into it more in a few days.
In the meanwhile, could you get the PR to compile and run tests? I think that for that you have to commit the thrift generated sources after all. We could keep the maven build step as an optional manual step that can be executed when somebody wants to regenerate it, and add instructions for it to sql/thriftserver/README.md.

I will make it pass build and test.

@yaooqinn
Copy link
Member

yaooqinn commented Nov 5, 2019

@juliuszsompolski thanks for instesting in kyuubi project. The hive deps can be removed easily but still existing because of lack of time and not a big deal before Spark 3.0.0. The only interface for kyuubi to access metastore should be
through SparkSession. The kyuubi stays as a standalone project is just for multiple spark version support (2.1-2.4 for now). I am also willing to help with Spark to apply kyuubi’s features. There are much more scenarios that use the cluster manager i.e. yarn,k8s as computing resousce serving layer not just inside one spark application. Currently,kyuubi supports only “client” mode,hacking a bit but solved with Spark’s ChildFirstClassLoader. When “cluster” mode will be supported, the kyuubi need not to hack spark to support multi tenancy

@@ -0,0 +1,363 @@
/**
* Autogenerated by Thrift Compiler (0.12.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code generated in Hive master seems to use 0.9.3. To avoid diffs, could you compile and use 0.9.3 from https://www.apache.org/dist/thrift/0.9.3/, and add to the readme to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code generated in Hive master seems to use 0.9.3. To avoid diffs, could you compile and use 0.9.3 from https://www.apache.org/dist/thrift/0.9.3/, and add to the readme to use it?

Since local is 0.12..... I will use 0.93 version to rebuild this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu Could we just add hive-service-rpc to our maven dependency?

    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-service-rpc</artifactId>
      <version>2.3.6</version>
    </dependency>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum We would be creating a dependency on a specific Hive version then again, because ThriftCLIService.java and ThriftCLIServiceClient.java and a few others depend on the exact version of the RPCs. Better to embed these.

@@ -0,0 +1,3 @@
Thrift commands to generate files from TCLIService.thrift:
--------------------
thrift --gen java:beans,hashcode -o src/gen/thrift if/TCLIService.thrift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add generated_annotations=undated to avoid diffs?
And add a sentence like Please use Thrift 0.9.3 available from https://www.apache.org/dist/thrift/0.9.3/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add generated_annotations=undated to avoid diffs?
And add a sentence like Please use Thrift 0.9.3 available from https://www.apache.org/dist/thrift/0.9.3/.

Sorry for didn't notice this point. I'll update this later.

</dependency>
<dependency>
<groupId>${hive.group}</groupId>
<artifactId>hive-service</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our thriftserver module is similar to hive-service. Could we adapt hive-beeline and hive-jdbc to this module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for doing that - to have our own client as a separate module, but not in the scope of this.

For now (in one of followup PR to this), we could start by pulling the client side into a new module (sql/jdbc-client?). I think thriftserver doesn't need to depend on hive-service and hive-beeline. These are needed only for a client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our thriftserver module is similar to hive-service. Could we adapt hive-beeline and hive-jdbc to this module?

Can't agree more to do this. Will make spark more independent, and we can have our own jdbc schema as jdbc:spark such as impala and presto.

@cloud-fan
Copy link
Contributor

Does this PR re-implement the thrift server on its own to get rid of Hive dependency? Is this a new architecture like https://github.com/yaooqinn/kyuubi ?

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 8, 2019

Does this PR re-implement the thrift server on its own to get rid of Hive dependency? Is this a new architecture like https://github.com/yaooqinn/kyuubi ?

No, what we want to do:

  1. make a thriftserver can cover current function and fit for different hive version, then we won't nee to make folder like v1.2.1/v2.3.5 to support different hive version
  2. Build our own beeline client and jdbc client, then we can remove rely on hive-beeline hive-jdbc
  3. Finally make hive as a plugin. Build a thriftserver can run without/with hive.

@cloud-fan
Copy link
Contributor

make a thriftserver can cover current function and fit for different hive version

How is this done?

@AngersZhuuuu
Copy link
Contributor Author

How is this done?

Currently only rely on hive's basic class such as HiveConf SessionState etc.. finally we will extract it out.
Thrift protocol class we build our own, code about ThriftServer framework we use ourself's code rely on our own Thrift protocol file.
Some class we implement one in spark's own code such as HiveAuthUtils, these class is independent so we can do like this.
Remove unused code and remove unused function.

@cloud-fan
Copy link
Contributor

can we do it incrementally? e.g each PR remove one file/class/method from v1.2.1/v2.3.5 and add necessary utils, and eventually make v1.2.1/v2.3.5 disappear.

@AngersZhuuuu
Copy link
Contributor Author

can we do it incrementally? e.g each PR remove one file/class/method from v1.2.1/v2.3.5 and add necessary utils, and eventually make v1.2.1/v2.3.5 disappear.

Hard to do like that, we have discussed a log about this, @juliuszsompolski @wangyum
Change based on origin hive-thriftserver is also hard to review.

@cloud-fan
Copy link
Contributor

This PR is too hard to review. If it's necessary to spend a lot of effort to re-implement a thrift server, shall we consider merging https://github.com/yaooqinn/kyuubi into Spark?

@AngersZhuuuu
Copy link
Contributor Author

This PR is too hard to review. If it's necessary to spend a lot of effort to re-implement a thrift server, shall we consider merging https://github.com/yaooqinn/kyuubi into Spark?

kyuubi is based on hive-1.2.1 so if we want to use, still need to fix.
and it's mode is one user one sparkcontext (you know this will make a lot problem).
And kyuubi remove a lot functions.

@yaooqinn
Copy link
Member

yaooqinn commented Nov 8, 2019

This PR is too hard to review. If it's necessary to spend a lot of effort to re-implement a thrift server, shall we consider merging https://github.com/yaooqinn/kyuubi into Spark?

kyuubi is based on hive-1.2.1 so if we want to use, still need to fix.

kyuubi only build with hive-1.2.1 to reuse its OperationLog logic, which can be replaced

and it's mode is one user one sparkcontext (you know this will make a lot problem).

Yes, it has been used on a production environment for more than two years,we had a lot of problems fixed. one user one sparkcontext in single jvm is the current usage, we may have a better idea to launch sparkcontext clusterly

And kyuubi remove a lot functions.

The newest kyuubi have no function removed, only the http thrift server support

@cloud-fan
Copy link
Contributor

@AngersZhuuuu do we have a doc to help review? i.e. which part is server and which part is client, how these classes interact with each other, etc.

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu do we have a doc to help review? i.e. which part is server and which part is client, how these classes interact with each other, etc.

Ok, will make it more clear.

@github-actions
Copy link

github-actions bot commented Mar 9, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 9, 2020
@github-actions github-actions bot closed this Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants