Skip to content

Conversation

@GuoPhilipse
Copy link
Member

@GuoPhilipse GuoPhilipse commented Jun 26, 2020

What changes were proposed in this pull request?

Add American timezone during timestamp_seconds doctest

Why are the changes needed?

timestamp_seconds doctest in functions.py used default timezone to get expected result
For example:

>>> time_df = spark.createDataFrame([(1230219000,)], ['unix_time'])
>>> time_df.select(timestamp_seconds(time_df.unix_time).alias('ts')).collect()
[Row(ts=datetime.datetime(2008, 12, 25, 7, 30))]

But when we have a non-american timezone, the test case will get different test result.

For example, when we set current timezone as Asia/Shanghai, the test result will be

[Row(ts=datetime.datetime(2008, 12, 25, 23, 30))]

So no matter where we run the test case ,we will always get the expected permanent result if we set the timezone on one specific area.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

@GuoPhilipse GuoPhilipse changed the title [SPARK-32088] fix timezone issue [SPARK-32088][PySpark] fix timezone issue Jun 26, 2020
@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon HyukjinKwon changed the title [SPARK-32088][PySpark] fix timezone issue [SPARK-32088][PYTHON] Pin the timezone in timestamp_seconds doctest Jun 26, 2020
@HyukjinKwon
Copy link
Member

@GuoPhilipse, can you elaborate a bit more in the PR description about why it fails with which output?

@GuoPhilipse
Copy link
Member Author

@GuoPhilipse, can you elaborate a bit more in the PR description about why it fails with which output?

sure, will improve it.

@SparkQA
Copy link

SparkQA commented Jun 26, 2020

Test build #124543 has finished for PR 28932 at commit 233ac9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @GuoPhilipse and @HyukjinKwon !
Merged to master.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 30, 2020

Actually, the SQL config spark.sql.session.timeZone is not used in PySpark collect() at all. This PR fixes the problem of running the example in a time zone different from America/Los_Angeles: #28959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants