-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47081][CONNECT] Support Query Execution Progress #45150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
084d257
Initial draft
grundprinzip 234b927
update
grundprinzip f78519e
update
grundprinzip 962dfd4
fix race condition
grundprinzip 4947e79
fix lint
grundprinzip 228717f
lint
grundprinzip 36d7924
lint
grundprinzip dfb29e4
fix
grundprinzip be08f53
more progress stuff
grundprinzip be7c445
Merge remote-tracking branch 'origin/master' into HEAD
grundprinzip 1b1a61a
fix
grundprinzip aa924c0
fix
grundprinzip 7cedd98
doc
grundprinzip e2063f2
fixing tests
grundprinzip 84425c3
fixing lint
grundprinzip 50e4cbd
fixing lint
grundprinzip 677e70b
refactoring to expose stage information
grundprinzip 71033d0
fix
grundprinzip 5687f6c
fix
grundprinzip 2d75941
fix tests and lint
grundprinzip 30560d0
Merge remote-tracking branch 'origin/master' into HEAD
grundprinzip 453bda9
lint
grundprinzip cc864c9
lint
grundprinzip ad4791e
review comments
grundprinzip b662410
lint
grundprinzip 85caee5
merge
grundprinzip ac91982
doc update
grundprinzip deffbbc
doc update
grundprinzip 415bdd8
doc update
grundprinzip 6fcc36f
fix lint
grundprinzip File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
Initial draft
- Loading branch information
commit 084d257d215cf3c415a65d5274ce51e76f645cab
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
124 changes: 124 additions & 0 deletions
124
.../main/scala/org/apache/spark/sql/connect/execution/ConnectProgressExecutionListener.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.connect.execution | ||
|
|
||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.scheduler.{SparkListener, SparkListenerJobEnd, SparkListenerJobStart, SparkListenerStageCompleted, SparkListenerTaskEnd} | ||
|
|
||
| /** | ||
| * A listener that tracks the execution of jobs and stages for a given set of tags. | ||
| * This is used to track the progress of a job that is being executed through the connect API. | ||
| * | ||
| * The listener is instantiated once for the SparkConnectService and then used to track all the | ||
| * current query executions. | ||
| */ | ||
| private[connect] class ConnectProgressExecutionListener extends SparkListener with Logging { | ||
grundprinzip marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| /** | ||
| * A tracker for a given tag. This is used to track the progress of an operation is being executed | ||
| * through the connect API. | ||
| */ | ||
| class ExecutionTracker(var tag: String) { | ||
grundprinzip marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| private[ConnectProgressExecutionListener] var jobs: Set[Int] = Set() | ||
grundprinzip marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| private[ConnectProgressExecutionListener] var stages: Set[Int] = Set() | ||
| private[ConnectProgressExecutionListener] var totalTasks = 0 | ||
| private[ConnectProgressExecutionListener] var completedTasks = 0 | ||
| private[ConnectProgressExecutionListener] var completedStages = 0 | ||
| private[ConnectProgressExecutionListener] var inputBytesRead = 0L | ||
| // The tracker is marked as dirty if it has new progress to report. This variable does | ||
| // not need to be protected by a mutex even if multiple threads would read the same dirty | ||
| // state the output is expected to be identical. | ||
| @volatile private[ConnectProgressExecutionListener] var dirty = false | ||
|
|
||
| /** | ||
| * Yield the current state of the tracker if it is dirty. A consumer of the tracker can provide | ||
| * a callback that will be called with the current state of the tracker if the tracker has new | ||
| * progress to report. | ||
| * | ||
| * If the tracker was marked as dirty, the state is reset after. | ||
| */ | ||
| def yieldWhenDirty(thunk: (Int, Int, Int, Int, Long) => Unit): Unit = { | ||
| if (dirty) { | ||
| thunk(totalTasks, completedTasks, stages.size, totalTasks, inputBytesRead) | ||
| dirty = false | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Add a job to the tracker. This will add the job to the list of jobs that are being tracked | ||
| */ | ||
| def addJob(job: SparkListenerJobStart): Unit = { | ||
| jobs = jobs + job.jobId | ||
| stages = stages ++ job.stageIds | ||
| totalTasks += job.stageInfos.map(_.numTasks).sum | ||
| } | ||
| } | ||
|
|
||
| val trackedTags = collection.mutable.Map[String, ExecutionTracker]() | ||
grundprinzip marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| override def onJobStart(jobStart: SparkListenerJobStart): Unit = { | ||
| val tags = jobStart.properties.getProperty("spark.job.tags") | ||
| if (tags != null) { | ||
| val thisJobTags = tags.split(",").map(_.trim).toSet | ||
| thisJobTags.foreach { tag => | ||
| if (trackedTags.contains(tag)) { | ||
| trackedTags(tag).addJob(jobStart) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { | ||
| // Check if the task belongs to a job that we are tracking. | ||
| trackedTags.foreach({ case (tag, tracker) => | ||
| if (tracker.stages.contains(taskEnd.stageId)) { | ||
| tracker.completedTasks += 1 | ||
| tracker.inputBytesRead += taskEnd.taskMetrics.inputMetrics.bytesRead | ||
| tracker.dirty = true | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = { | ||
| trackedTags.foreach({ case (tag, tracker) => | ||
| if (tracker.stages.contains(stageCompleted.stageInfo.stageId)) { | ||
| tracker.completedStages += 1 | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = { | ||
grundprinzip marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| trackedTags.foreach({ case (tag, tracker) => | ||
| if (tracker.jobs.contains(jobEnd.jobId)) { | ||
| tracker.jobs -= jobEnd.jobId | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| def registerJobTag(tag: String): Unit = { | ||
| trackedTags += tag -> new ExecutionTracker(tag) | ||
| } | ||
|
|
||
| def removeJobTag(tag: String): Unit = { | ||
| trackedTags -= tag | ||
| } | ||
|
|
||
| def clearJobTags(): Unit = { | ||
| trackedTags.clear() | ||
| } | ||
|
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for the current running stage or all stages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Across all stages. It can always be extended later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how can this be accurate. With AQE we never know what is the number of partitions for the next stage, as re-optimization can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of the progress metrics is not to be accurate into the future but only represent the snapshot of the current state. This means that the number of tasks can be updated when new stages are added or AQE kicks in.
The point is that the number of remaining tasks will converge over time and become stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just my 2c: I think having any progress bar is much better than none. The standard Spark progress bar has some ups and some downs, definitely having new progress bars appear isn't the most intuitive either. I think it's probably net better than one progress bar that gets longer, but I would much prefer having some progress bar now that we can extend later, perhaps as we get a better sense of how to incorporate AQE and future stages into the UX.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second thought, it's better to hide Spark internals (stages) to end users, and eventually we should only have one progress bar for the query. So the current PR is a good starting point.
However, this server-client protocol needs to be stable and we don't want to change the client frequently to improve the progress reporting. Can we define a minimum set of information we need to send to the client side to display the progress bar? I feel it's better to calculate the percentage at the server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I refactored the code to avoid closing any doors. I did not change the way the progress bar is displayed. However, I extended the progress message to capture the stage-wise information so other clients can decide independently how to present the information to the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 @cloud-fan what do you think about that? Capture stage-level info in the proto, but keep the display simple for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea this is more flexible. The proto message contains all the information and clients can do whatever they want.