[SPARK-4415] [PySpark] JVM should exit after Python exit #3274

davies · 2014-11-14T23:36:53Z

When JVM is started in a Python process, it should exit once the stdin is closed.

test: add spark.driver.memory in conf/spark-defaults.conf

davies@dm:~/work/spark$ cat conf/spark-defaults.conf
spark.driver.memory       8g
davies@dm:~/work/spark$ bin/pyspark
>>> quit
davies@dm:~/work/spark$ jps
4931 Jps
286
davies@dm:~/work/spark$ python wc.py
943738
0.719928026199
davies@dm:~/work/spark$ jps
286
4990 Jps

davies · 2014-11-14T23:38:27Z

cc @andrewor14

SparkQA · 2014-11-14T23:40:03Z

Test build #23397 has started for PR 3274 at commit 050651f.

This patch merges cleanly.

vanzin · 2014-11-14T23:53:00Z

So, if I understand correctly this handles the case where pyspark apps are not executed using the pyspark script, but with python directly?

It feels a little bit sketchy to support that, but the change looks good.

andrewor14 · 2014-11-14T23:55:29Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala

I would just call this PYSPARK, and rename the variable isPySpark

(before you do that, can you search the codebase to see if we already use the PYSPARK environment variable? It would be good to avoid clobbering it)

andrewor14 · 2014-11-14T23:57:13Z

@vanzin Yes your understanding is correct. I think this is safe to support in case the user wants to use different versions of python. Otherwise this silently does not kill the outer process, which is unintuitive.

vanzin · 2014-11-15T00:02:10Z

in case the user wants to use different versions of python

Is there a way to define the python executable to use for the executors? Otherwise this will end up in tears, since pickle is not compatible across python versions...

SparkQA · 2014-11-15T00:07:32Z

Test build #23399 has started for PR 3274 at commit ce8599c.

This patch merges cleanly.

davies · 2014-11-15T00:07:39Z

@vanzin The python used in exector could be defined by PYSPARK_PYTHON, so it's easy to run pyspark with different python, such as:

$ PYSPARK_PYTHON=pypy pypy wc.py

Or run python with any options

$ python -u -s -B wc.py

vanzin · 2014-11-15T00:08:25Z

Ah, cool. Thanks for clarifying.

SparkQA · 2014-11-15T01:02:25Z

Test build #23397 has finished for PR 3274 at commit 050651f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class InSet(value: Expression, hset: Set[Any])
- case class In(attribute: String, values: Array[Any]) extends Filter

AmplabJenkins · 2014-11-15T01:02:28Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23397/
Test PASSed.

andrewor14 · 2014-11-15T01:32:28Z

python/pyspark/java_gateway.py

Sorry, I missed this case. Actually not all pyspark applications should go through this path. I think we should rename this variable to IS_PYTHON_SUBPROCESS on second thought:

env["IS_PYTHON_SUBPROCESS"] = "1" # Tell JVM to exit after python exits

andrewor14 · 2014-11-15T01:40:25Z

Hey @davies sorry I missed the case in which the python application is run through spark-submit, which doesn't actually go through this code path. I have provided suggestions for renaming the variables and rephrasing certain comments.

SparkQA · 2014-11-15T02:07:33Z

Test build #23399 timed out for PR 3274 at commit ce8599c after a configured wait of 120m.

AmplabJenkins · 2014-11-15T02:07:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23399/
Test FAILed.

SparkQA · 2014-11-15T02:20:08Z

Test build #23406 has started for PR 3274 at commit df0e524.

This patch merges cleanly.

SparkQA · 2014-11-15T03:41:16Z

Test build #23406 has finished for PR 3274 at commit df0e524.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-15T03:41:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23406/
Test PASSed.

andrewor14 · 2014-11-15T04:13:22Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala

can run. I'll fix this when I merge it

andrewor14 · 2014-11-15T04:13:33Z

Ok merging this master 1.2 thanks @davies

When JVM is started in a Python process, it should exit once the stdin is closed. test: add spark.driver.memory in conf/spark-defaults.conf ``` daviesdm:~/work/spark$ cat conf/spark-defaults.conf spark.driver.memory 8g daviesdm:~/work/spark$ bin/pyspark >>> quit daviesdm:~/work/spark$ jps 4931 Jps 286 daviesdm:~/work/spark$ python wc.py 943738 0.719928026199 daviesdm:~/work/spark$ jps 286 4990 Jps ``` Author: Davies Liu <[email protected]> Closes #3274 from davies/exit and squashes the following commits: df0e524 [Davies Liu] address comments ce8599c [Davies Liu] address comments 050651f [Davies Liu] JVM should exit after Python exit (cherry picked from commit 7fe08b4) Signed-off-by: Andrew Or <[email protected]>

JVM should exit after Python exit

050651f

andrewor14 reviewed Nov 14, 2014
View reviewed changes

address comments

ce8599c

andrewor14 reviewed Nov 15, 2014
View reviewed changes

address comments

df0e524

andrewor14 reviewed Nov 15, 2014
View reviewed changes

core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala

Copy link

Contributor

andrewor14 Nov 15, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can run. I'll fix this when I merge it

asfgit closed this in 7fe08b4 Nov 15, 2014

[SPARK-4415] [PySpark] JVM should exit after Python exit #3274

[SPARK-4415] [PySpark] JVM should exit after Python exit #3274

Uh oh!

Conversation

davies commented Nov 14, 2014

Uh oh!

davies commented Nov 14, 2014

Uh oh!

SparkQA commented Nov 14, 2014

Uh oh!

vanzin commented Nov 14, 2014

Uh oh!

andrewor14 Nov 14, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 Nov 14, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Nov 14, 2014

Uh oh!

vanzin commented Nov 15, 2014

Uh oh!

SparkQA commented Nov 15, 2014

Uh oh!

davies commented Nov 15, 2014

Uh oh!

vanzin commented Nov 15, 2014

Uh oh!

SparkQA commented Nov 15, 2014

Uh oh!

AmplabJenkins commented Nov 15, 2014

Uh oh!

andrewor14 Nov 15, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Nov 15, 2014

Uh oh!

SparkQA commented Nov 15, 2014

Uh oh!

AmplabJenkins commented Nov 15, 2014

Uh oh!

SparkQA commented Nov 15, 2014

Uh oh!

SparkQA commented Nov 15, 2014

Uh oh!

AmplabJenkins commented Nov 15, 2014

Uh oh!

andrewor14 Nov 15, 2014

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Nov 15, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants