-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24566][CORE] spark.storage.blockManagerSlaveTimeoutMs default config #21575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz use DurationConversions.
|
I have made the modification, @maropu please review the code, thank you |
|
Have you checked if the other parameters (spark.shuffle.io.connectionTimeout, spark.rpc.askTimeout or spark.rpc.lookupTimeout) could have the same issue? |
|
It seems that "spark.core.connection.ack.wait.timeout" and "spark.shuffle.io.connectionTimeout" are used only in tests which might be legacy and do not have an impact on normal code, and "spark.rpc.lookupTimeout" don't have the same issue. |
|
btw, better to add tests and can you do? |
|
I added the tests, thanks @maropu |
|
ok to test |
|
Test build #91985 has started for PR 21575 at commit |
|
Jenkins, retest this please |
|
Test build #92004 has finished for PR 21575 at commit
|
|
cc: @jiangxb1987 |
| // "milliseconds" | ||
| private val slaveTimeoutMs = | ||
| sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s") | ||
| sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The val slaveTimeoutMs is only used in
| sc.conf.getTimeAsSeconds("spark.network.timeout", s"${slaveTimeoutMs}ms") * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @zsxwing to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look at this carefully, I think your are right, thanks @jiangxb1987 . One case that is not relevant with this PR is like this: set spark.storage.blockManagerSlaveTimeoutMs=900ms and not configure spark.network.timeout, then executorTimeoutMs will be 0 since getTimeAsSeconds loos precision for ms. This config maybe not reasonable. If need fix how about add ensuring > 0 or make executorTimeoutMs's min value as 1, @jiangxb1987 @zsxwing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we never use slaveTimeoutMs. Let's delete this val and just assign this value to executorTimeoutMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed temp val slaveTimeout, also timeoutIntervalMs is the same case, so removed too, thanks @zsxwing @jiangxb1987
|
Test build #92160 has finished for PR 21575 at commit
|
| } | ||
| } | ||
|
|
||
| test("SPARK-24566") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is useless. It tests copy-pasted codes and if someone changes the original codes, this will still pass. I prefer to not add this one.
| sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s") | ||
| private val executorTimeoutMs = | ||
| sc.conf.getTimeAsSeconds("spark.network.timeout", s"${slaveTimeoutMs}ms") * 1000 | ||
| sc.conf.getTimeAsSeconds("spark.network.timeout", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant something like this to match the docs:
private val executorTimeoutMs =
sc.conf.getTimeAsMs(
"spark.storage.blockManagerSlaveTimeoutMs",
s"${sc.conf.getTimeAsSeconds("spark.network.timeout", "120s")}s")
Could you also change
Line 637 in 53c06dd
| s"${sc.conf.getTimeAsSeconds("spark.network.timeout", "120s") * 1000L}ms"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, please have a review, thank you @zsxwing @jiangxb1987
| sc.conf.getTimeAsMs("spark.storage.blockManagerTimeoutIntervalMs", "60s") | ||
| private val checkTimeoutIntervalMs = | ||
| sc.conf.getTimeAsSeconds("spark.network.timeoutInterval", s"${timeoutIntervalMs}ms") * 1000 | ||
| sc.conf.getTimeAsSeconds("spark.network.timeoutInterval", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert this since it's unrelated.
|
|
||
| import scala.collection.mutable | ||
| import scala.concurrent.Future | ||
| import scala.concurrent.duration._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be unused after you address my comments
|
Test build #92479 has finished for PR 21575 at commit
|
|
LGTM. Thanks! Merging to master. |
What changes were proposed in this pull request?
This PR use spark.network.timeout in place of spark.storage.blockManagerSlaveTimeoutMs when it is not configured, as configuration doc said
How was this patch tested?
manual test