-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23743][SQL] Changed a comparison logic from containing 'slf4j' to starting with 'org.slf4j' #20860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #88397 has finished for PR 20860 at commit
|
This PR adds initial support for loading multiple versions of Hive in a single JVM and provides a common interface for extracting metadata from the `HiveMetastoreClient` for a given version. This is accomplished by creating an isolated `ClassLoader` that operates according to the following rules: - __Shared Classes__: Java, Scala, logging, and Spark classes are delegated to `baseClassLoader` allowing the results of calls to the `ClientInterface` to be visible externally. - __Hive Classes__: new instances are loaded from `execJars`. These classes are not accessible externally due to their custom loading. - __Barrier Classes__: Classes such as `ClientWrapper` are defined in Spark but must link to a specific version of Hive. As a result, the bytecode is acquired from the Spark `ClassLoader` but a new copy is created for each instance of `IsolatedClientLoader`. This new instance is able to see a specific version of hive without using reflection where ever hive is consistent across versions. Since this is a unique instance, it is not visible externally other than as a generic `ClientInterface`, unless `isolationOn` is set to `false`. In addition to the unit tests, I have also tested this locally against mysql instances of the Hive Metastore. I've also successfully ported Spark SQL to run with this client, but due to the size of the changes, that will come in a follow-up PR. By default, Hive jars are currently downloaded from Maven automatically for a given version to ease packaging and testing. However, there is also support for specifying their location manually for deployments without internet. Author: Michael Armbrust <[email protected]> Closes #5851 from marmbrus/isolatedClient and squashes the following commits: c72f6ac [Michael Armbrust] rxins comments 1e271fa [Michael Armbrust] [SPARK-6907][SQL] Isolated client for HiveMetastore
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me.
|
Thank you for reviewing this PR, @dongjoon-hyun @HyukjinKwon |
|
|
||
| name.contains("slf4j") || | ||
| name.startsWith("org.slf4j") || | ||
| name.contains("log4j") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the same for "log4j"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It looks like a similar issue. I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
…org.apache.*log4j'
|
Test build #88538 has finished for PR 20860 at commit
|
|
LGTM. I'm also playing around with isolated hive classloader these days. |
|
Jenkins, retest this please. |
|
Test build #88692 has finished for PR 20860 at commit
|
|
Jenkins, retest this please. |
|
Test build #88695 has finished for PR 20860 at commit
|
|
@jerryshao Do I have anything to fix that failure? |
|
Retest this please. |
|
@jongyoul . The test suite fails sometimes. Let's retry Jenkins once more. |
|
@dongjoon-hyun Thanks. I'll do that next time. :-) |
|
Thanks, merging to master! |
|
@jerryshao . It's still under testing~ |
|
Sorry I didn't notice it, will wait for the test finishing. |
|
Test build #88732 has finished for PR 20860 at commit
|
… to starting with 'org.slf4j'
## What changes were proposed in this pull request?
isSharedClass returns if some classes can/should be shared or not. It checks if the classes names have some keywords or start with some names. Following the logic, it can occur unintended behaviors when a custom package has `slf4j` inside the package or class name. As I guess, the first intention seems to figure out the class containing `org.slf4j`. It would be better to change the comparison logic to `name.startsWith("org.slf4j")`
## How was this patch tested?
This patch should pass all of the current tests and keep all of the current behaviors. In my case, I'm using ProtobufDeserializer to get a table schema from hive tables. Thus some Protobuf packages and names have `slf4j` inside. Without this patch, it cannot be resolved because of ClassCastException from different classloaders.
Author: Jongyoul Lee <[email protected]>
Closes apache#20860 from jongyoul/SPARK-23743.
What changes were proposed in this pull request?
isSharedClass returns if some classes can/should be shared or not. It checks if the classes names have some keywords or start with some names. Following the logic, it can occur unintended behaviors when a custom package has
slf4jinside the package or class name. As I guess, the first intention seems to figure out the class containingorg.slf4j. It would be better to change the comparison logic toname.startsWith("org.slf4j")How was this patch tested?
This patch should pass all of the current tests and keep all of the current behaviors. In my case, I'm using ProtobufDeserializer to get a table schema from hive tables. Thus some Protobuf packages and names have
slf4jinside. Without this patch, it cannot be resolved because of ClassCastException from different classloaders.