Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jun 30, 2020

What changes were proposed in this pull request?

Modify the example for timestamp_seconds and replace collect() by show().

Why are the changes needed?

The SQL config spark.sql.session.timeZone doesn't influence on the collect in the example. The code below demonstrates that:

$ export TZ="UTC"
>>> from pyspark.sql.functions import timestamp_seconds
>>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>>> time_df = spark.createDataFrame([(1230219000,)], ['unix_time'])
>>> time_df.select(timestamp_seconds(time_df.unix_time).alias('ts')).collect()
[Row(ts=datetime.datetime(2008, 12, 25, 15, 30))]

The expected time is 07:30 but we get 15:30.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the modified example via:

$ ./python/run-tests --modules=pyspark-sql

@MaxGekk MaxGekk changed the title [SPARK-32088][PYTHON][FOLLOWUP] Replace collect() by show() in the timestamp_seconds example [SPARK-32088][PYTHON][FOLLOWUP] Replace collect() by show() in the example for timestamp_seconds Jun 30, 2020
Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending jenkins test

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124688 has finished for PR 28959 at commit aa6a0e1.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 30, 2020

It seems the build failure is not related to the changes:

[error] /home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/SSLOptions.scala:71: type Server is not a member of object org.eclipse.jetty.util.ssl.SslContextFactory
[error]       val sslContextFactory = new SslContextFactory.Server()
[error]                                                     ^
[info] No documentation generated with unsuccessful compiler run
[error] one error found

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 30, 2020

Hi, @MaxGekk . Are you sure that is due to #28863 ? Is it build error or test error? I need your confirmation.

@dongjoon-hyun
Copy link
Member

Hi, @MaxGekk and @jiangxb1987 . Could you give me more hints?

@dongjoon-hyun
Copy link
Member

Until now, I'm observing the following. I'll monitor more, but it would be great if you can give me more evidence so that I can help you, @MaxGekk and @jiangxb1987 .

  1. GitHub Action is healthy all profile combination.
  2. Jenkins job is running on the latest master branch.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

Sure, let's block before the test results.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124718 has finished for PR 28959 at commit aa6a0e1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

def timestamp_seconds(col):
"""
>>> from pyspark.sql.functions import timestamp_seconds
>>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we set the timezone in pyspark doctest, like the scala side test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is the common approach we do for all timestamp related pyspark doc tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do that too. Let's set it when we face another similar issue. It's not difficult to do it.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124729 has finished for PR 28959 at commit aa6a0e1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124735 has finished for PR 28959 at commit aa6a0e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@HyukjinKwon
Copy link
Member

FYI,

[error] /home/jenkins/workspace/SparkPullRequestBuilder@6/core/src/main/scala/org/apache/spark/SSLOptions.scala:71: type Server is not a member of object org.eclipse.jetty.util.ssl.SslContextFactory
[error]       val sslContextFactory = new SslContextFactory.Server()
[error]                                                     ^
[info] No documentation generated with unsuccessful compiler run
[error] one error found

This is from unidoc when you generate the documentation. It sorts of compiles Scala files back to Java files, and extract the documentations from there. It's kind of buggy time to time.

@MaxGekk MaxGekk deleted the SPARK-32088-fix-timezone-issue-followup branch December 11, 2020 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants