Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR uses python3 instead of python3.6 executable as a fallback in IntegratedUDFTestUtils.

Why are the changes needed?

Currently, GitHub Actions skips pandas UDFs. Python 3.8 is installed explicitly but somehow python3.6 looks available in GitHub Actions build environment by default.

[info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
[info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...

It was chosen as python3.6 for Jenkins to pick one Python explicitly; however, looks we're already using python3 here and there.

It will also reduce the overhead to fix when we deprecate or drop Python versions.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

It should be tested in Jenkins and GitHub Actions environments here.

@HyukjinKwon HyukjinKwon changed the title [SPARK-32422][TESTS] Use python3 executable instead of python3.6 in IntegratedUDFTestUtils [SPARK-32422][SQL][TESTS] Use python3 executable instead of python3.6 in IntegratedUDFTestUtils Jul 24, 2020
@HyukjinKwon
Copy link
Member Author

I reverted 826689a here and get it back again to confirm IntegratedUDFTestUtils and pandas related tests are not being skipped.

@HyukjinKwon
Copy link
Member Author

It is now tested properly: https://github.com/apache/spark/runs/905415145

...
[info] - udf/udf-inner-join.sql - Scalar Pandas UDF (527 milliseconds)
...
[info] - udf/udf-special-values.sql - Scalar Pandas UDF (747 milliseconds)

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126472 has finished for PR 29217 at commit d72c4c1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126468 has finished for PR 29217 at commit c70e29f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126483 has finished for PR 29217 at commit d72c4c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

lazy val pythonExec: String = {
val pythonExec = sys.env.getOrElse(
"PYSPARK_DRIVER_PYTHON", sys.env.getOrElse("PYSPARK_PYTHON", "python3.6"))
"PYSPARK_DRIVER_PYTHON", sys.env.getOrElse("PYSPARK_PYTHON", "python3"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is GitHub Action specific issue, shall we use export PYSPARK_PYTHON in GitHub Action side?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had better isolate GitHub Action specific changes into master.yml. Otherwise, we will hit this failure when we backport GitHub Action into old branches.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jul 25, 2020

I think it's okay. Installing Python 3.8 in GitHub Actions overwrites python3. So, we won't face the tests being skipped even when we backport as long as there's the explicit Python 3 installation.

I wouldn't say this is only a GitHub specific issue. It was python3.6 in the codes for Jenkins environment when I added. It was to explicitly pick a Python version but we're already using python3 to pick Python 3 in many places like our dev scripts.

It fixes the issue in GitHub Actions but also keep the codes consistent with other places. Plus, it will reduce overhead of maintenance when we drop or deprecate minor Python version.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. +1, LGTM. Thanks for the explanation.
Merged to master.

@HyukjinKwon
Copy link
Member Author

Thank you @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Just a question. Is this used in R test, too?

@HyukjinKwon
Copy link
Member Author

Nope it isnt. I believe this is irrelacent with R test results.

@HyukjinKwon
Copy link
Member Author

Nope it isnt. I believe this is irrelevant with R test results.

@dongjoon-hyun
Copy link
Member

Thank you for confirmation~

@HyukjinKwon HyukjinKwon deleted the SPARK-32422 branch July 27, 2020 07:43
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
… in IntegratedUDFTestUtils

### What changes were proposed in this pull request?

This PR uses `python3` instead of `python3.6` executable as a fallback in `IntegratedUDFTestUtils`.

### Why are the changes needed?

Currently, GitHub Actions skips pandas UDFs. Python 3.8 is installed explicitly but somehow `python3.6` looks available in GitHub Actions build environment by default.

```
[info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
[info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
```

It was chosen as `python3.6` for Jenkins to pick one Python explicitly; however, looks we're already using `python3` here and there.

It will also reduce the overhead to fix when we deprecate or drop Python versions.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It should be tested in Jenkins and GitHub Actions environments here.

Closes apache#29217 from HyukjinKwon/SPARK-32422.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
… in IntegratedUDFTestUtils

### What changes were proposed in this pull request?

This PR uses `python3` instead of `python3.6` executable as a fallback in `IntegratedUDFTestUtils`.

### Why are the changes needed?

Currently, GitHub Actions skips pandas UDFs. Python 3.8 is installed explicitly but somehow `python3.6` looks available in GitHub Actions build environment by default.

```
[info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
[info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
```

It was chosen as `python3.6` for Jenkins to pick one Python explicitly; however, looks we're already using `python3` here and there.

It will also reduce the overhead to fix when we deprecate or drop Python versions.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It should be tested in Jenkins and GitHub Actions environments here.

Closes apache#29217 from HyukjinKwon/SPARK-32422.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
… in IntegratedUDFTestUtils

### What changes were proposed in this pull request?

This PR uses `python3` instead of `python3.6` executable as a fallback in `IntegratedUDFTestUtils`.

### Why are the changes needed?

Currently, GitHub Actions skips pandas UDFs. Python 3.8 is installed explicitly but somehow `python3.6` looks available in GitHub Actions build environment by default.

```
[info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
[info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
```

It was chosen as `python3.6` for Jenkins to pick one Python explicitly; however, looks we're already using `python3` here and there.

It will also reduce the overhead to fix when we deprecate or drop Python versions.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It should be tested in Jenkins and GitHub Actions environments here.

Closes apache#29217 from HyukjinKwon/SPARK-32422.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
… in IntegratedUDFTestUtils

### What changes were proposed in this pull request?

This PR uses `python3` instead of `python3.6` executable as a fallback in `IntegratedUDFTestUtils`.

### Why are the changes needed?

Currently, GitHub Actions skips pandas UDFs. Python 3.8 is installed explicitly but somehow `python3.6` looks available in GitHub Actions build environment by default.

```
[info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
[info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
```

It was chosen as `python3.6` for Jenkins to pick one Python explicitly; however, looks we're already using `python3` here and there.

It will also reduce the overhead to fix when we deprecate or drop Python versions.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It should be tested in Jenkins and GitHub Actions environments here.

Closes apache#29217 from HyukjinKwon/SPARK-32422.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
… in IntegratedUDFTestUtils

### What changes were proposed in this pull request?

This PR uses `python3` instead of `python3.6` executable as a fallback in `IntegratedUDFTestUtils`.

### Why are the changes needed?

Currently, GitHub Actions skips pandas UDFs. Python 3.8 is installed explicitly but somehow `python3.6` looks available in GitHub Actions build environment by default.

```
[info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
[info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
```

It was chosen as `python3.6` for Jenkins to pick one Python explicitly; however, looks we're already using `python3` here and there.

It will also reduce the overhead to fix when we deprecate or drop Python versions.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It should be tested in Jenkins and GitHub Actions environments here.

Closes apache#29217 from HyukjinKwon/SPARK-32422.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
… in IntegratedUDFTestUtils

### What changes were proposed in this pull request?

This PR uses `python3` instead of `python3.6` executable as a fallback in `IntegratedUDFTestUtils`.

### Why are the changes needed?

Currently, GitHub Actions skips pandas UDFs. Python 3.8 is installed explicitly but somehow `python3.6` looks available in GitHub Actions build environment by default.

```
[info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
[info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!!
...
```

It was chosen as `python3.6` for Jenkins to pick one Python explicitly; however, looks we're already using `python3` here and there.

It will also reduce the overhead to fix when we deprecate or drop Python versions.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It should be tested in Jenkins and GitHub Actions environments here.

Closes apache#29217 from HyukjinKwon/SPARK-32422.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants