-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode #41201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
Outdated
Show resolved
Hide resolved
|
cc @pralabhkumar and @holdenk from #37417 |
|
+1 looks reasonable module the existing suggestions (clean up the logging + tighten the test). Thanks for making this PR :) |
|
LGTM . |
485a96d to
25305b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might affect other modes. A better place would be the entrypoint.sh for resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh to include work-dir into the classpath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we have checked the k8s cluster mode
if (isKubernetesClusterModeDriver) {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't add current working dir to executor's class path right?
Just checked with yarn's behavior, yarn add CWD to both driver and executor. And it puts CWD before localized SPARK_CONF and HADOOP_CONF.
See
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Line 1442 in 014685c
| addClasspathEntry(Environment.PWD.$$(), env) |
To get the similar behavior, I believe it would be easier to leverage the entrypoint.sh here when running on K8S.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will check it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about now? @advancedxy
I checked the code, for driver, if just leverage the entrypoint.sh, it is difficult to keep the behavior as mentioned above.
So I just leverage the entrypoint.sh for executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
|
gentle ping @dongjoon-hyun @holdenk @pralabhkumar @pan3793 for the latest change, thanks a lot. |
pan3793
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the title, not only Driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but why this is prepended? For this specific part, I'm strong negative because this could have a side-effect like spark.executor.userClassPathFirst=true. As we know, we don't recommend userClassPathFirst at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the correction,will check and address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but why this is prepended? For this specific part, I'm strong negative because this could have a side-effect like
spark.executor.userClassPathFirst=true. As we know, we don't recommenduserClassPathFirstat all.
Hi @dongjoon-hyun, just to be clear, are you against for putting the PWD into executor's class path or for the PWD being the first in the class path?
In my opinion, to align with spark on yarn's behavior, PWD should be put in both driver and executor's class path. But I'm ok for it to be last or anywhere in the class path.
By the way, this pr put PWD(.) in the first of class path for driver, if you have concern about PWD being first, the driver may have the same issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya. The second one. I'm only worrying about the prepending. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, @turboFei would you mind to do some search to make sure which place the PWD should be put in the executor’s class path? sorry for the inconvenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for executor classpath, I think that we can place the working directory at the tail of the classpath.
|
cc @advancedxy once more because of his #41201 (comment) . Now, the last commit reverted |
…driver in K8S cluster mode
fa6fa38 to
11b5288
Compare
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @turboFei and all!
|
thanks all! |
…driver in K8S cluster mode ### What changes were proposed in this pull request? Adding working directory into classpath on the driver in K8S cluster mode. ### Why are the changes needed? After apache#37417, the spark.files, spark.jars are placed in the working directory. But seems that the spark context classloader can not access them because they are not in the classpath by default. This pr adds the current working directory into classpath, so that the spark.files, spark.jars placed in the working directory can be accessible by the classloader. For example, the `hive-site.xml` uploaded by `spark.files`. ### Does this PR introduce _any_ user-facing change? yes, users do not need to add the working directory into spark classpath manually. ### How was this patch tested? UT. Closes apache#41201 from turboFei/work_dir_classpath. Authored-by: fwang12 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]>


What changes were proposed in this pull request?
Adding working directory into classpath on the driver in K8S cluster mode.
Why are the changes needed?
After #37417, the spark.files, spark.jars are placed in the working directory.
But seems that the spark context classloader can not access them because they are not in the classpath by default.
This pr adds the current working directory into classpath, so that the spark.files, spark.jars placed in the working directory can be accessible by the classloader.
For example, the
hive-site.xmluploaded byspark.files.Does this PR introduce any user-facing change?
yes, users do not need to add the working directory into spark classpath manually.
How was this patch tested?
UT.