-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47081][CONNECT] Support Query Execution Progress #45150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
084d257
234b927
f78519e
962dfd4
4947e79
228717f
36d7924
dfb29e4
be08f53
be7c445
1b1a61a
aa924c0
7cedd98
e2063f2
84425c3
50e4cbd
677e70b
71033d0
5687f6c
2d75941
30560d0
453bda9
cc864c9
ad4791e
b662410
85caee5
ac91982
deffbbc
415bdd8
6fcc36f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,6 +80,7 @@ | |
| # Running MyPy type checks will always require pandas and | ||
| # other dependencies so importing here is fine. | ||
| from pyspark.sql.connect.client import SparkConnectClient | ||
| from pyspark.sql.connect.shell.progress import ProgressHandler | ||
|
|
||
| try: | ||
| import memory_profiler # noqa: F401 | ||
|
|
@@ -1967,6 +1968,8 @@ def client(self) -> "SparkConnectClient": | |
| message_parameters={"feature": "SparkSession.client"}, | ||
| ) | ||
|
|
||
| def | ||
|
|
||
| def addArtifacts( | ||
| self, *path: str, pyfile: bool = False, archive: bool = False, file: bool = False | ||
| ) -> None: | ||
|
|
@@ -2002,6 +2005,62 @@ def addArtifacts( | |
|
|
||
| addArtifact = addArtifacts | ||
|
|
||
| def registerProgressHandler(self, handler: "ProgressHandler") -> None: | ||
| """ | ||
| Register a progress handler to be called when a progress update is received from the server. | ||
|
|
||
| .. versionadded:: 4.0 | ||
|
Comment on lines
+2008
to
+2010
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Potentially silly question, but: When you look at the docs for this, it's not obvious that Spark Connect supports this method. Should this be explicitly noted in the docstring somehow? Or are users supposed to assume that everything supports Spark Connect unless explicitly noted otherwise?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another related question: Should there be narrative documentation of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Creating a PR with documentation updates would be very much appreciated! |
||
|
|
||
| Parameters | ||
| ---------- | ||
| handler : ProgressHandler | ||
| A callable that follows the ProgressHandler interface. This handler will be called | ||
| on every progress update. | ||
|
|
||
| Examples | ||
| -------- | ||
|
|
||
| >>> def progress_handler(stages, inflight_tasks, done): | ||
| ... print(f"{len(stages)} Stages known, Done: {done}") | ||
| >>> spark.registerProgressHandler(progress_handler) | ||
| >>> res = spark.range(10).repartition(1).collect() | ||
| 3 Stages known, Done: False | ||
| 3 Stages known, Done: True | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test is flaky: https://github.com/apache/spark/actions/runs/8564043093/job/23470007059. Let me skip it for now.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll unflake it. Thanks! |
||
| >>> spark.clearProgressHandlers() | ||
| """ | ||
| raise PySparkRuntimeError( | ||
| error_class="ONLY_SUPPORTED_WITH_SPARK_CONNECT", | ||
| message_parameters={"feature": "SparkSession.registerProgressHandler"}, | ||
| ) | ||
|
|
||
| def removeProgressHandler(self, handler: "ProgressHandler") -> None: | ||
| """ | ||
| Remove a progress handler that was previously registered. | ||
|
|
||
| .. versionadded:: 4.0 | ||
|
|
||
| Parameters | ||
| ---------- | ||
| handler : ProgressHandler | ||
| The handler to remove if present in the list of progress handlers. | ||
| """ | ||
| raise PySparkRuntimeError( | ||
| error_class="ONLY_SUPPORTED_WITH_SPARK_CONNECT", | ||
| message_parameters={"feature": "SparkSession.removeProgressHandler"}, | ||
| ) | ||
|
|
||
| def clearProgressHandlers(self) -> None: | ||
| """ | ||
| Clear all registered progress handlers. | ||
|
|
||
| .. versionadded:: 4.0 | ||
| """ | ||
| raise PySparkRuntimeError( | ||
| error_class="ONLY_SUPPORTED_WITH_SPARK_CONNECT", | ||
| message_parameters={"feature": "SparkSession.clearProgressHandlers"}, | ||
| ) | ||
|
|
||
|
|
||
| def copyFromLocalToFs(self, local_path: str, dest_path: str) -> None: | ||
| """ | ||
| Copy file from local to cloud storage file system. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.