Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 9, 2021

What changes were proposed in this pull request?

This PR is a follow-up of #34526 to adjust one pyspark.rdd doctest additionally.

- >>> b''.join(result).decode('utf-8')
+ >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result])

Why are the changes needed?

Python 3.8/3.9

Using Python version 3.8.12 (default, Nov  8 2021 17:15:19)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636432954207).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
[b'bar\n', b'foo\n']

Python 3.10

Using Python version 3.10.0 (default, Oct 29 2021 14:35:18)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636433378727).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
['bar\n', 'foo\n']

Does this PR introduce any user-facing change?

No.

How was this patch tested?

$ python/run-tests --testnames pyspark.rdd

@SparkQA
Copy link

SparkQA commented Nov 9, 2021

Test build #145018 has finished for PR 34529 at commit 795f083.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented Nov 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49490/

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-37244-2 branch November 9, 2021 06:51
@SparkQA
Copy link

SparkQA commented Nov 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49490/

sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
This PR aims to support building and running tests on Python 3.10.

Python 3.10 added many new features and breaking changes.
- https://docs.python.org/3/whatsnew/3.10.html

This PR is a follow-up of apache#34526 to adjust one `pyspark.rdd` doctest additionally.

```python
- >>> b''.join(result).decode('utf-8')
+ >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result])
```

**Python 3.8/3.9**
```python
Using Python version 3.8.12 (default, Nov  8 2021 17:15:19)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636432954207).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
[b'bar\n', b'foo\n']
```

**Python 3.10**
```python
Using Python version 3.10.0 (default, Oct 29 2021 14:35:18)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636433378727).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
['bar\n', 'foo\n']
```

No.

```
$ python/run-tests --testnames pyspark.rdd
```

Closes apache#34529 from dongjoon-hyun/SPARK-37244-2.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 47ceae4)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants