-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3571] Spark standalone cluster mode doesn't work. #2436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -491,14 +491,13 @@ private[spark] class Master( | |
| val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) | ||
| val aliveWorkerNum = shuffledAliveWorkers.size | ||
| var curPos = 0 | ||
| var stopPos = aliveWorkerNum | ||
| for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers | ||
| // We assign workers to each waiting driver in a round-robin fashion. For each driver, we | ||
| // start from the last worker that was assigned a driver, and continue onwards until we have | ||
| // explored all alive workers. | ||
| curPos = (curPos + 1) % aliveWorkerNum | ||
| val startPos = curPos | ||
| var launched = false | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the correct thing to do here is keep a counter that keeps track of how many workers we've looked at so far, then the while loop would look like:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then you don't need
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's one way but if lots of driver waits, while loop have to start shuffleedWorkers(0) per one driver. I think it's not efficient.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, what do you mean? That's just another way of writing the same thing, and we still start at
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we use numWorkersVisited and numWorkersAlive, numWorkersVisited is set to 0 before entering while loop even though still in for loop.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still don't understand what you mean.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what I mean: andrewor14/spark@apache:master...andrewor14:fix-standalone-cluster
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I noticed and wrote same code.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had misunderstanding. I thought you mentioned removing stopPos and curPos. Now I understand keeping curPos. |
||
| while (curPos != startPos && !launched) { | ||
| while (curPos != stopPos && !launched) { | ||
| val worker = shuffledAliveWorkers(curPos) | ||
| if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { | ||
| launchDriver(worker, driver) | ||
|
|
@@ -507,6 +506,8 @@ private[spark] class Master( | |
| } | ||
| curPos = (curPos + 1) % aliveWorkerNum | ||
| } | ||
| curPos = (stopPos + 1) % aliveWorkerNum | ||
| stopPos = curPos + aliveWorkerNum | ||
| } | ||
|
|
||
| // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this result in an infinite loop if no workers are available?
curPosis never equal toaliveWorkerNum