Skip to content

Conversation

@xueyumusic
Copy link
Contributor

What changes were proposed in this pull request?

This PR use spark.network.timeout in place of spark.storage.blockManagerSlaveTimeoutMs when it is not configured, as configuration doc said

How was this patch tested?

manual test

@xueyumusic xueyumusic changed the title spark.storage.blockManagerSlaveTimeoutMs default config [SPARK-24566][CORE] spark.storage.blockManagerSlaveTimeoutMs default config Jun 15, 2018
Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz use DurationConversions.

@xueyumusic
Copy link
Contributor Author

I have made the modification, @maropu please review the code, thank you

@maropu
Copy link
Member

maropu commented Jun 15, 2018

Have you checked if the other parameters (spark.shuffle.io.connectionTimeout, spark.rpc.askTimeout or spark.rpc.lookupTimeout) could have the same issue?

@xueyumusic
Copy link
Contributor Author

It seems that "spark.core.connection.ack.wait.timeout" and "spark.shuffle.io.connectionTimeout" are used only in tests which might be legacy and do not have an impact on normal code, and "spark.rpc.lookupTimeout" don't have the same issue.
The only one for "spark.rpc.askTimeout" which I am not sure whether it is an issue is https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/Client.scala#L229. I am not sure whether it is a special case that force this config 10s when not configured

@maropu
Copy link
Member

maropu commented Jun 16, 2018

btw, better to add tests and can you do?

@xueyumusic
Copy link
Contributor Author

I added the tests, thanks @maropu

@felixcheung
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jun 17, 2018

Test build #91985 has started for PR 21575 at commit 9673613.

@felixcheung
Copy link
Member

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jun 17, 2018

Test build #92004 has finished for PR 21575 at commit 9673613.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jun 18, 2018

cc: @jiangxb1987

// "milliseconds"
private val slaveTimeoutMs =
sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s")
sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The val slaveTimeoutMs is only used in

sc.conf.getTimeAsSeconds("spark.network.timeout", s"${slaveTimeoutMs}ms") * 1000
, so I don't think this change shall fix anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zsxwing to confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at this carefully, I think your are right, thanks @jiangxb1987 . One case that is not relevant with this PR is like this: set spark.storage.blockManagerSlaveTimeoutMs=900ms and not configure spark.network.timeout, then executorTimeoutMs will be 0 since getTimeAsSeconds loos precision for ms. This config maybe not reasonable. If need fix how about add ensuring > 0 or make executorTimeoutMs's min value as 1, @jiangxb1987 @zsxwing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we never use slaveTimeoutMs. Let's delete this val and just assign this value to executorTimeoutMs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed temp val slaveTimeout, also timeoutIntervalMs is the same case, so removed too, thanks @zsxwing @jiangxb1987

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92160 has finished for PR 21575 at commit 025bfb3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("SPARK-24566") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is useless. It tests copy-pasted codes and if someone changes the original codes, this will still pass. I prefer to not add this one.

sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s")
private val executorTimeoutMs =
sc.conf.getTimeAsSeconds("spark.network.timeout", s"${slaveTimeoutMs}ms") * 1000
sc.conf.getTimeAsSeconds("spark.network.timeout",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like this to match the docs:

  private val executorTimeoutMs =
    sc.conf.getTimeAsMs(
      "spark.storage.blockManagerSlaveTimeoutMs",
      s"${sc.conf.getTimeAsSeconds("spark.network.timeout", "120s")}s")

Could you also change

s"${sc.conf.getTimeAsSeconds("spark.network.timeout", "120s") * 1000L}ms"),
to use the above pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, please have a review, thank you @zsxwing @jiangxb1987

sc.conf.getTimeAsMs("spark.storage.blockManagerTimeoutIntervalMs", "60s")
private val checkTimeoutIntervalMs =
sc.conf.getTimeAsSeconds("spark.network.timeoutInterval", s"${timeoutIntervalMs}ms") * 1000
sc.conf.getTimeAsSeconds("spark.network.timeoutInterval",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert this since it's unrelated.


import scala.collection.mutable
import scala.concurrent.Future
import scala.concurrent.duration._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be unused after you address my comments

@SparkQA
Copy link

SparkQA commented Jun 29, 2018

Test build #92479 has finished for PR 21575 at commit 536025d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Jun 29, 2018

LGTM. Thanks! Merging to master.

@asfgit asfgit closed this in f71e8da Jun 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants