Skip to content

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

This PR makes Sequence expression supports ANSI intervals as step expression.
If the start and stop expression is TimestampType, then the step expression could select year-month or day-time interval.
If the start and stop expression is DateType, then the step expression must be year-month.

Why are the changes needed?

Extends the function of Sequence expression.

Does this PR introduce any user-facing change?

'Yes'. Users could use ANSI intervals as step expression for Sequence expression.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Apr 23, 2021
@beliefer beliefer changed the title Accept ANSI intervals by the Sequence expression [SPARK-35088][SQL] Accept ANSI intervals by the Sequence expression Apr 23, 2021
@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Test build #137853 has finished for PR 32311 at commit 1e96d54.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42383/

@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42388/

@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42388/

@SparkQA
Copy link

SparkQA commented Apr 23, 2021

Test build #137858 has finished for PR 32311 at commit 89a34bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the comment

then the step expression must resolve to the 'interval' type, otherwise to the same type
. interval relates to CalendarIntervalType


checkEvaluation(new Sequence(
Literal(Timestamp.valueOf("2018-01-01 00:00:00")),
Literal(Timestamp.valueOf("2018-01-02 00:00:01")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the test for CalendarInterval? It seems it checks a specific case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm dazzled.

Timestamp.valueOf("2018-01-01 12:00:00"),
Timestamp.valueOf("2018-01-01 00:00:00")))

checkEvaluation(new Sequence(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The test is already big enough. Does it make sense to put new checks to a separate test and prepend JIRA id?

Literal(Date.valueOf("2018-01-01")),
Literal(Date.valueOf("2018-01-05")),
Literal(Period.ofDays(2))),
EmptyRow, "sequence step must be a day interval if start and end values are dates")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a day interval -> a day-time interval. Please, call DayTimeIntervalType.typeName() here. Since the types are not stable yet, we can rename the types in the near future.

@MaxGekk
Copy link
Member

MaxGekk commented Apr 23, 2021

The error message confuses slightly:

s"$prettyName only supports integral, timestamp or date types")

Type checking can fails because of unsupported type of steps even start and stop have one of: integral, timestamp or date.

Literal(Date.valueOf("1970-02-01")),
Literal(Period.ofMonths(-1))),
EmptyRow,
s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, 28 because we assume 28 days per month there

?

By introducing new interval types, we tried to avoid such assumption.

Copy link
Contributor Author

@beliefer beliefer Apr 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good found.
It seems we should change the behavior of CalenderInterval so as it could avoid such assumption.
Or the assumption looks like is a bug for the CalenderInterval.
If so , could we fix the bug in another PR?

Copy link
Contributor Author

@beliefer beliefer Apr 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In further, microsPerDay just used to estimated length of the sequences. We just need to improve the exception message so as avoid output the assume value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I see.


private class PeriodSequenceImpl[T: ClassTag]
(dt: IntegralType, scale: Long, fromLong: Long => T, zoneId: ZoneId)
(implicit num: Integral[T]) extends InternalSequenceBase(dt, scale, fromLong, zoneId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it is good idea to use timestampAddInterval() in InternalSequenceBase.eval for adding months to dates. I guess DateTimeUtils.dateAddMonths() and DateTimeUtils.timestampAddInterval can return different result, especially taking into account that dateAddMonths() does not depend on the current time zone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the current implement uses DateTimeUtils.timestampAddInterval and it's behavior seems good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Let's use timestampAddInterval since we don't have an example that could demonstrate any issues caused by timestampAddInterval().

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Test build #137913 has finished for PR 32311 at commit 1d38d02.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42435/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42436/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42436/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Test build #137919 has finished for PR 32311 at commit cb361d1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

If start and stop expressions resolve to the 'date' or 'timestamp' type
then the step expression must resolve to the 'interval' type, otherwise to the same type
as the start and stop expressions.
then the step expression must resolve to the 'interval' or 'year-month' or 'day-time' type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
'year-month' -> 'year-month interval'
'day-time' -> 'day-time interval'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

|$prettyName uses the wrong parameter type. The parameter type must conform to:
|1. The start and stop expressions must resolve to the same type.
|2. If start and stop expressions resolve to the 'date' or 'timestamp' type
|then the step expression must resolve to the 'interval' or 'year-month' or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you call YearMonthIntervalType.typeName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42440/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42440/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Test build #137914 has finished for PR 32311 at commit 75562fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42453/

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42453/

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. GA passed. Merging to master.
Thank you, @beliefer .

@MaxGekk MaxGekk closed this in c0a3c0c Apr 26, 2021
@beliefer
Copy link
Contributor Author

@MaxGekk Thanks for you review.

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Test build #137932 has finished for PR 32311 at commit 47853e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137932/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants