Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging {
*/
def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String): String = {
val sparkValue = conf.get(key, default)
if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") {
if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

The logic you want here is the equivalent of:

if conf.contains(key)
  get spark conf
elif is_running_on_yarn()
  get conf from yarn
else
  return default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that --conf spark.shuffle.service.port = 7338 is configured, 7338 is displayed on the tab of the environment, but 7337 is actually used.
So my idea is get value from SparkConf if key starting with spark. except for spark.hadoop..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YarnConfiguration can only configure one spark.shuffle.service.port value.
We can gradually upgrade the shuffle service if get spark.shuffle.service.port value from SparkConf because we can set different values ​​for different applications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what you're saying, but let me explain how the configuration is expected to work.

"spark." options are set in "SparkConf". "spark.hadoop.*" options, on top of those, should also be reflected in any Hadoop Configuration objects that are created.

So you should never need to directly reference "spark.hadoop." properties in Spark code. They are not meant to be used by Spark, they are meant to be Hadoop configs. That's why I'm saying your code should not be doing what it is doing.

From what I understand of what you're trying to do, you want "spark.shuffle.service.port" to have precedence over the YARN configuration. For that, you just do what I suggested above. Check whether it's set in the Spark configuration before you even look at any Hadoop configuration.

The current order of precedence should be:

  • spark.hadoop.spark.shuffle.service.port (since it overrides Hadoop config)
  • hadoop config (spark.shuffle.service.port set in xml files)
  • spark.shuffle.service.port

You're proposing moving the lowest one to the top. That's a simple change. If you're trying to also fix something else, then it means there's a problem in another place.

&& (key.startsWith("spark.hadoop.") || !key.startsWith("spark."))) {
new YarnConfiguration(SparkHadoopUtil.get.newConfiguration(conf)).get(key, sparkValue)
} else {
sparkValue
Expand Down