-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4415] [PySpark] JVM should exit after Python exit #3274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @andrewor14 |
|
Test build #23397 has started for PR 3274 at commit
|
|
So, if I understand correctly this handles the case where pyspark apps are not executed using the pyspark script, but with python directly? It feels a little bit sketchy to support that, but the change looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just call this PYSPARK, and rename the variable isPySpark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(before you do that, can you search the codebase to see if we already use the PYSPARK environment variable? It would be good to avoid clobbering it)
|
@vanzin Yes your understanding is correct. I think this is safe to support in case the user wants to use different versions of python. Otherwise this silently does not kill the outer process, which is unintuitive. |
Is there a way to define the python executable to use for the executors? Otherwise this will end up in tears, since pickle is not compatible across python versions... |
|
Test build #23399 has started for PR 3274 at commit
|
|
@vanzin The python used in exector could be defined by PYSPARK_PYTHON, so it's easy to run pyspark with different python, such as: Or run python with any options |
|
Ah, cool. Thanks for clarifying. |
|
Test build #23397 has finished for PR 3274 at commit
|
|
Test PASSed. |
python/pyspark/java_gateway.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this case. Actually not all pyspark applications should go through this path. I think we should rename this variable to IS_PYTHON_SUBPROCESS on second thought:
env["IS_PYTHON_SUBPROCESS"] = "1" # Tell JVM to exit after python exits
|
Hey @davies sorry I missed the case in which the python application is run through spark-submit, which doesn't actually go through this code path. I have provided suggestions for renaming the variables and rephrasing certain comments. |
|
Test build #23399 timed out for PR 3274 at commit |
|
Test FAILed. |
|
Test build #23406 has started for PR 3274 at commit
|
|
Test build #23406 has finished for PR 3274 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can run. I'll fix this when I merge it
|
Ok merging this master 1.2 thanks @davies |
When JVM is started in a Python process, it should exit once the stdin is closed. test: add spark.driver.memory in conf/spark-defaults.conf ``` daviesdm:~/work/spark$ cat conf/spark-defaults.conf spark.driver.memory 8g daviesdm:~/work/spark$ bin/pyspark >>> quit daviesdm:~/work/spark$ jps 4931 Jps 286 daviesdm:~/work/spark$ python wc.py 943738 0.719928026199 daviesdm:~/work/spark$ jps 286 4990 Jps ``` Author: Davies Liu <[email protected]> Closes #3274 from davies/exit and squashes the following commits: df0e524 [Davies Liu] address comments ce8599c [Davies Liu] address comments 050651f [Davies Liu] JVM should exit after Python exit (cherry picked from commit 7fe08b4) Signed-off-by: Andrew Or <[email protected]>
When JVM is started in a Python process, it should exit once the stdin is closed.
test: add spark.driver.memory in conf/spark-defaults.conf