Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 3, 2020

What changes were proposed in this pull request?

Scala:

scala> spark.range(10).explain("cost")
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

PySpark:

>>> spark.range(10).explain("cost")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain
    raise TypeError(err_msg)
TypeError: extended (optional) should be provided as bool, got <class 'str'>

In addition, it is consistent with other codes too, for example, DataFrame.sample also can support DataFrame.sample(1.0) and DataFrame.sample(False).

Why are the changes needed?

To provide the consistent API support across APIs.

Does this PR introduce any user-facing change?

Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set mode as df.explain("...") in Spark 3.1.

After this PR:

>>> spark.range(10).explain("cost")
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

How was this patch tested?

Unittest was added and manually tested as well to make sure:

spark.range(10).explain(True)
spark.range(10).explain(False)
spark.range(10).explain("cost")
spark.range(10).explain(extended="cost")
spark.range(10).explain(mode="cost")
spark.range(10).explain()
spark.range(10).explain(True, "cost")
spark.range(10).explain(1.0)

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123454 has finished for PR 28711 at commit 984f33b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Looks reasonable.

@maropu
Copy link
Member

maropu commented Jun 3, 2020

btw, typo in the description?

This is also consistent with DataFrame.sample case.
                                       ^^^^^

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 3, 2020

Oh, I meant this is also consistent with DataFrame.sample in PySpark and Scala side - supporting DataFrame.sample(1.0) and DataFrame.sample(False). Let me clarify.

@HyukjinKwon
Copy link
Member Author

Thanks @maropu. Merged to master and branch-3.0!

HyukjinKwon added a commit that referenced this pull request Jun 3, 2020
…ase to be consistent with Scala side

### What changes were proposed in this pull request?

Scala:

```scala
scala> spark.range(10).explain("cost")
```
```
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```

PySpark:

```python
>>> spark.range(10).explain("cost")
```
```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/dataframe.py", line 333, in explain
    raise TypeError(err_msg)
TypeError: extended (optional) should be provided as bool, got <class 'str'>
```

In addition, it is consistent with other codes too, for example, `DataFrame.sample` also can support `DataFrame.sample(1.0)` and `DataFrame.sample(False)`.

### Why are the changes needed?

To provide the consistent API support across APIs.

### Does this PR introduce _any_ user-facing change?

Nope, it's only changes in unreleased branches.
If this lands to master only, yes, users will be able to set `mode` as `df.explain("...")` in Spark 3.1.

After this PR:

```python
>>> spark.range(10).explain("cost")
```
```
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12)), Statistics(sizeInBytes=80.0 B)

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```

### How was this patch tested?

Unittest was added and manually tested as well to make sure:

```python
spark.range(10).explain(True)
spark.range(10).explain(False)
spark.range(10).explain("cost")
spark.range(10).explain(extended="cost")
spark.range(10).explain(mode="cost")
spark.range(10).explain()
spark.range(10).explain(True, "cost")
spark.range(10).explain(1.0)
```

Closes #28711 from HyukjinKwon/SPARK-31895.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit e1d5201)
Signed-off-by: HyukjinKwon <[email protected]>
@maropu
Copy link
Member

maropu commented Jun 3, 2020

hahaha, I see.

HyukjinKwon pushed a commit to databricks/koalas that referenced this pull request Jun 4, 2020
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it.

```py
>>> df.spark.explain("extended")  # doctest: +ELLIPSIS
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
 ...
```
@HyukjinKwon HyukjinKwon deleted the SPARK-31895 branch July 27, 2020 07:44
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
Since Spark 3.0 will support `DataFrame.explain(extended: str)` case (apache/spark#28711), we can follow it.

```py
>>> df.spark.explain("extended")  # doctest: +ELLIPSIS
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
 ...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants