Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -813,8 +813,14 @@ private[spark] class Client(
if (pythonPath.nonEmpty) {
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
.mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
env("PYTHONPATH") = pythonPathStr
sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
val newValue =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just say env.get("PYTHONPATH") ++=: pythonPath before turning the list into a string.

But there's also two extra questions here:

  • precedence; should the env come before or after the files added with py-files? I kinda think after makes more sense, since files are generally provided in the command line.
  • should appMasterEnv be reflected in executors? With your code it is. I'm not so sure it should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good questions

  • precedence: So right now you can work around this issue by exporting PYTHONPATH before you launch spark-submit, I think this is something that could just be in someone's env on the launcher box and might not be what you want in a yarn container. I would think that specifying explicit pythonpath via spark.yarn.appMasterEnv would take precedence over that since you explicitly configured. Now the second question is where that fails with the py-files and that one isn't as clear to me since like you said its explicitly specified. Maybe we do py-files then spark.yarn.appMasterEnv.PYTHONPATH and then last env PYTHONPATH. that is different from the way it is now though. thoughts?

  • agree this should not be reflected in the executors so if it is we shouldn't do that. We should make sure the spark. executorEnv.PYTHONPATH works

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments @vanzin , have made the necessary changes. As far as precedence is concerned, I am still not sure whether I understood your question at first, however @tgravescs clarified it for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note as @vanzin said you can just use the ++=: operator with the listbuffer to prepend and get rid of the if conditions before converting to string.
env.get("PYTHONPATH") ++=: (sys.env.get("PYTHONPATH") ++=: pythonPath)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgravescs Have replaced the if-else code with ++ operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary. What you're trying to do here is concatenate the values of the current PYTHONPATH env variable, the PYTHONPATH created by Spark, and the PYTHONPATH set in spark.yarn.appMasterEnv.

That's, respectively, sys.env.get("PYTHONPATH"), pythonPath, and env.get("PYTHONPATH"). So just concatenate those.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done it, but I need the pythonPathStr variable to set it in executorEnv so have not modified that bit. Let me know if you think otherwise.

if (env.contains("PYTHONPATH")) {
env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr
} else {
pythonPathStr
}
env("PYTHONPATH") = newValue
sparkConf.setExecutorEnv("PYTHONPATH", newValue)
}

if (isClusterMode) {
Expand Down