-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42219][CORE] Introducing a config to close all active SparkContexts after the Main method has finished #39775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bad example from ancient YARN age, @attilapiros . I'd not re-enforce those bad habits. Instead, I can give you the counter examples like
| sparkConf.getBoolean("spark.kubernetes.submitInDriver", false) |
spark/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
Lines 88 to 90 in aeb2a13
| if (sparkConf.getOption("spark.yarn.appMasterEnv.OMP_NUM_THREADS").isEmpty && | |
| sparkConf.getOption("spark.mesos.driverEnv.OMP_NUM_THREADS").isEmpty && | |
| sparkConf.getOption("spark.kubernetes.driverEnv.OMP_NUM_THREADS").isEmpty) { |
|
@dongjoon-hyun I have moved the config into the k8s module |
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Show resolved
Hide resolved
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
|
cc @holdenk |
|
This pyspark test failure is unrelated: |
|
cc @HyukjinKwon, @srowen |
|
SPARK-42698(#40314) is aiming to expand the scope of stopping SparkContext after |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be specific to Kubernetes?
Does it need to be a config or a method you can call?
Actually, why would you not kill the contexts after main exits in any case?
The original #32283 was Kubernetes specific. This PR just adds a new config to have the old behaviour as default but make the new one also available.
Unfortunately there is a use case for both behaviour. See the next point.
I bumped into this change when I analysed an application where spark was used as a job server. |
|
cc @mridulm |
|
Our customer also encountered this issue recently. They are migrating their spark job server (a spring boot application) from Spark 2.4 to Spark 3. |
|
I am closing this PR. Job servers has the option to do blocking call in the main method to avoid the auto stopping of the active spark contexts. |
What changes were proposed in this pull request?
Introducing a config to close all active SparkContexts after the Main method has finished.
Why are the changes needed?
We run into errors after upgrading from Spark 3.1 to Spark 3.2 as the SparkContext get closed right after the starting of the application. It turned out the root cause is SPARK-34674 which introduced the closing of the SparkContexts after the Main method has finished. For details see #32283.
This application was a spark job server built on top of springboot so all the job submits were outside of the main method.
Does this PR introduce any user-facing change?
With the current default (true) so it is the same behaviour as for YARN.
How was this patch tested?
Manually.