Skip to content

Conversation

@jongyoul
Copy link
Member

What changes were proposed in this pull request?

isSharedClass returns if some classes can/should be shared or not. It checks if the classes names have some keywords or start with some names. Following the logic, it can occur unintended behaviors when a custom package has slf4j inside the package or class name. As I guess, the first intention seems to figure out the class containing org.slf4j. It would be better to change the comparison logic to name.startsWith("org.slf4j")

How was this patch tested?

This patch should pass all of the current tests and keep all of the current behaviors. In my case, I'm using ProtobufDeserializer to get a table schema from hive tables. Thus some Protobuf packages and names have slf4j inside. Without this patch, it cannot be resolved because of ClassCastException from different classloaders.

@SparkQA
Copy link

SparkQA commented Mar 20, 2018

Test build #88397 has finished for PR 20860 at commit 192ce30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jongyoul referenced this pull request Mar 22, 2018
This PR adds initial support for loading multiple versions of Hive in a single JVM and provides a common interface for extracting metadata from the `HiveMetastoreClient` for a given version.  This is accomplished by creating an isolated `ClassLoader` that operates according to the following rules:

 - __Shared Classes__: Java, Scala, logging, and Spark classes are delegated to `baseClassLoader`
  allowing the results of calls to the `ClientInterface` to be visible externally.
 - __Hive Classes__: new instances are loaded from `execJars`.  These classes are not
  accessible externally due to their custom loading.
 - __Barrier Classes__: Classes such as `ClientWrapper` are defined in Spark but must link to a specific version of Hive.  As a result, the bytecode is acquired from the Spark `ClassLoader` but a new copy is created for each instance of `IsolatedClientLoader`.
  This new instance is able to see a specific version of hive without using reflection where ever hive is consistent across versions. Since
  this is a unique instance, it is not visible externally other than as a generic
  `ClientInterface`, unless `isolationOn` is set to `false`.

In addition to the unit tests, I have also tested this locally against mysql instances of the Hive Metastore.  I've also successfully ported Spark SQL to run with this client, but due to the size of the changes, that will come in a follow-up PR.

By default, Hive jars are currently downloaded from Maven automatically for a given version to ease packaging and testing.  However, there is also support for specifying their location manually for deployments without internet.

Author: Michael Armbrust <[email protected]>

Closes #5851 from marmbrus/isolatedClient and squashes the following commits:

c72f6ac [Michael Armbrust] rxins comments
1e271fa [Michael Armbrust] [SPARK-6907][SQL] Isolated client for HiveMetastore
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me.

@jongyoul
Copy link
Member Author

Thank you for reviewing this PR, @dongjoon-hyun @HyukjinKwon


name.contains("slf4j") ||
name.startsWith("org.slf4j") ||
name.contains("log4j") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the same for "log4j"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It looks like a similar issue. I'll add it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@SparkQA
Copy link

SparkQA commented Mar 23, 2018

Test build #88538 has finished for PR 20860 at commit 2ea9b7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor

LGTM. I'm also playing around with isolated hive classloader these days.

@jerryshao
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 29, 2018

Test build #88692 has finished for PR 20860 at commit 2ea9b7a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 29, 2018

Test build #88695 has finished for PR 20860 at commit 2ea9b7a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jongyoul
Copy link
Member Author

@jerryshao Do I have anything to fix that failure?

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

@jongyoul . The test suite fails sometimes. Let's retry Jenkins once more.

@jongyoul
Copy link
Member Author

@dongjoon-hyun Thanks. I'll do that next time. :-)

@jerryshao
Copy link
Contributor

Thanks, merging to master!

@dongjoon-hyun
Copy link
Member

@jerryshao . It's still under testing~

@jerryshao
Copy link
Contributor

Sorry I didn't notice it, will wait for the test finishing.

@SparkQA
Copy link

SparkQA commented Mar 30, 2018

Test build #88732 has finished for PR 20860 at commit 2ea9b7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in df05fb6 Mar 30, 2018
mshtelma pushed a commit to mshtelma/spark that referenced this pull request Apr 5, 2018
… to starting with 'org.slf4j'

## What changes were proposed in this pull request?
isSharedClass returns if some classes can/should be shared or not. It checks if the classes names have some keywords or start with some names. Following the logic, it can occur unintended behaviors when a custom package has `slf4j` inside the package or class name. As I guess, the first intention seems to figure out the class containing `org.slf4j`. It would be better to change the comparison logic to `name.startsWith("org.slf4j")`

## How was this patch tested?
This patch should pass all of the current tests and keep all of the current behaviors. In my case, I'm using ProtobufDeserializer to get a table schema from hive tables. Thus some Protobuf packages and names have `slf4j` inside. Without this patch, it cannot be resolved because of ClassCastException from different classloaders.

Author: Jongyoul Lee <[email protected]>

Closes apache#20860 from jongyoul/SPARK-23743.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants