Skip to content

Conversation

@fsamuel-bs
Copy link

@fsamuel-bs fsamuel-bs commented Nov 2, 2021

Upstream SPARK-36627 ticket and PR link (if not applicable, explain)

https://issues.apache.org/jira/browse/SPARK-36627
apache#33879

What changes were proposed in this pull request?

In JavaSerializer.JavaDeserializationStream we override resolveClass of ObjectInputStream to use the threads' contextClassLoader. However, we do not override resolveProxyClass, which is used when deserializing Java proxy objects, which makes spark use the wrong classloader when deserializing objects, which causes the job to fail with the following exception:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, <host>, executor 1): java.lang.ClassNotFoundException: <class&gt;
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:398)
	at java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
	at java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
	...
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)

Why are the changes needed?

Spark deserialization fails with no recourse for the user.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Closes apache#33879 from fsamuel-bs/SPARK-36627.

Authored-by: Samuel Souza [email protected]
Signed-off-by: Sean Owen [email protected]

Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

## Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)
https://issues.apache.org/jira/browse/SPARK-36627

## What changes were proposed in this pull request?
In JavaSerializer.JavaDeserializationStream we override resolveClass of ObjectInputStream to use the threads' contextClassLoader. However, we do not override resolveProxyClass, which is used when deserializing Java proxy objects, which makes spark use the wrong classloader when deserializing objects, which causes the job to fail with the following exception:

```
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, <host>, executor 1): java.lang.ClassNotFoundException: <class&gt;
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:398)
	at java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829)
	at java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917)
	...
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
```

### Why are the changes needed?
Spark deserialization fails with no recourse for the user.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Unit tests.

Closes apache#33879 from fsamuel-bs/SPARK-36627.

Authored-by: Samuel Souza <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants