-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31895][PYTHON][SQL] Support DataFrame.explain(extended: str) case to be consistent with Scala side #28711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
|
Test build #123454 has finished for PR 28711 at commit
|
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Looks reasonable.
|
btw, typo in the description? |
|
Oh, I meant this is also consistent with |
|
Thanks @maropu. Merged to master and branch-3.0! |
…ase to be consistent with Scala side
### What changes were proposed in this pull request?
Scala:
```scala
scala> spark.range(10).explain("cost")
```
```
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)
== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```
PySpark:
```python
>>> spark.range(10).explain("cost")
```
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain
raise TypeError(err_msg)
TypeError: extended (optional) should be provided as bool, got <class 'str'>
```
In addition, it is consistent with other codes too, for example, `DataFrame.sample` also can support `DataFrame.sample(1.0)` and `DataFrame.sample(False)`.
### Why are the changes needed?
To provide the consistent API support across APIs.
### Does this PR introduce _any_ user-facing change?
Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set `mode` as `df.explain("...")` in Spark 3.1.
After this PR:
```python
>>> spark.range(10).explain("cost")
```
```
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)
== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```
### How was this patch tested?
Unittest was added and manually tested as well to make sure:
```python
spark.range(10).explain(True)
spark.range(10).explain(False)
spark.range(10).explain("cost")
spark.range(10).explain(extended="cost")
spark.range(10).explain(mode="cost")
spark.range(10).explain()
spark.range(10).explain(True, "cost")
spark.range(10).explain(1.0)
```
Closes #28711 from HyukjinKwon/SPARK-31895.
Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit e1d5201)
Signed-off-by: HyukjinKwon <[email protected]>
|
hahaha, I see. |
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it. ```py >>> df.spark.explain("extended") # doctest: +ELLIPSIS == Parsed Logical Plan == ... == Analyzed Logical Plan == ... == Optimized Logical Plan == ... == Physical Plan == ... ```
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it. ```py >>> df.spark.explain("extended") # doctest: +ELLIPSIS == Parsed Logical Plan == ... == Analyzed Logical Plan == ... == Optimized Logical Plan == ... == Physical Plan == ... ```
What changes were proposed in this pull request?
Scala:
PySpark:
In addition, it is consistent with other codes too, for example,
DataFrame.samplealso can supportDataFrame.sample(1.0)andDataFrame.sample(False).Why are the changes needed?
To provide the consistent API support across APIs.
Does this PR introduce any user-facing change?
Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set
modeasdf.explain("...")in Spark 3.1.After this PR:
How was this patch tested?
Unittest was added and manually tested as well to make sure: