Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 14, 2019

What changes were proposed in this pull request?

In the PR, I propose to use the plusMonths() method of LocalDate to add months to a date. This method adds the specified amount to the months field of LocalDate in three steps:

  1. Add the input months to the month-of-year field
  2. Check if the resulting date would be invalid
  3. Adjust the day-of-month to the last valid day if necessary

The difference between current behavior and propose one is in handling the last day of month in the original date. For example, adding 1 month to 2019-02-28 will produce 2019-03-28 comparing to the current implementation where the result is 2019-03-31.

The proposed behavior is implemented in MySQL and PostgreSQL.

How was this patch tested?

By existing test suites DateExpressionsSuite, DateFunctionsSuite and DateTimeUtilsSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 14, 2019

@cloud-fan Please, take a look at the PR.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I favor this change as less surprising and more consistent with Java APIs too.


- Since Spark 3.0, substitution order of nested WITH clauses is changed and an inner CTE definition takes precedence over an outer. In version 2.4 and earlier, `WITH t AS (SELECT 1), t2 AS (WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2` returns `1` while in version 3.0 it returns `2`. The previous behaviour can be restored by setting `spark.sql.legacy.ctePrecedence.enabled` to `true`.

- Since Spark 3.0, the `add_months` function does not adjust the resulting date to a last day of month if the original date is a last day of month. The resulting date is adjust to a last day of month only if it is invalid. For example, `select add_months(DATE'2019-02-28', 1)` produces `2019-03-28` but `select add_months(DATE'2019-01-31', 1)` produces `2019-02-28`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is adjust -> is adjusted.
So previously, adding a month to 2019-02-28 resulted in 2019-03-31? It might be worth clarifying that too in your example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So previously, adding a month to 2019-02-28 resulted in 2019-03-31?

Yes, it does:

scala> spark.sql("select add_months(DATE'2019-02-28', 1)").show
+--------------------------------+
|add_months(DATE '2019-02-28', 1)|
+--------------------------------+
|                      2019-03-31|
+--------------------------------+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the SQL migration guide, and add the example.

@SparkQA
Copy link

SparkQA commented Jul 14, 2019

Test build #107649 has finished for PR 25153 at commit 668608c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • final class AddColumn implements ColumnChange
  • final class RenameColumn implements ColumnChange
  • final class UpdateColumnType implements ColumnChange
  • final class UpdateColumnComment implements ColumnChange
  • final class DeleteColumn implements ColumnChange
  • case class UnresolvedRelation(
  • case class AlterTable(
  • case class AlterTableExec(

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 14, 2019

The latest build failed on CollectionExpressionsSuite, in particular on:

      checkEvaluation(new Sequence(
        Literal(Date.valueOf("2018-01-31")),
        Literal(Date.valueOf("2018-04-30")),
        Literal(CalendarInterval.fromString("interval 1 month"))),
        Seq(
          Date.valueOf("2018-01-31"),
          Date.valueOf("2018-02-28"),
          Date.valueOf("2018-03-31"),
          Date.valueOf("2018-04-30")))

because current implementation incrementally adds steps of month and micros to previous value. So, such implementation produces:

Seq(
          Date.valueOf("2018-01-31"),
          Date.valueOf("2018-02-28"),
          Date.valueOf("2018-03-28"),
          Date.valueOf("2018-04-28"))

I am going to change this in TemporalSequenceImpl to arr[i+1] = add_monts(start, i * stepMonth, i * stepMicros, timeZone)

@SparkQA
Copy link

SparkQA commented Jul 14, 2019

Test build #107652 has finished for PR 25153 at commit f1a4241.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

      checkEvaluation(new Sequence(
        Literal(Date.valueOf("2018-01-31")),
        Literal(Date.valueOf("2018-04-30")),
        Literal(CalendarInterval.fromString("interval 1 month"))),
        Seq(
          Date.valueOf("2018-01-31"),
          Date.valueOf("2018-02-28"),
          Date.valueOf("2018-03-31"),
          Date.valueOf("2018-04-30")))

Is this a valid test? Do other databases have the same behavior?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 15, 2019

@cloud-fan This feature was borrowed from prestodb #21155 . Looking at the issue prestodb/presto#10765 , the expected values in the test are valid, it seems.

@wangyum
Copy link
Member

wangyum commented Jul 15, 2019

Teradata also returns 2019-03-28 if add a month to 2019-02-28:
image

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f241fc7 Jul 15, 2019
vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
## What changes were proposed in this pull request?

In the PR, I propose to use the `plusMonths()` method of `LocalDate` to add months to a date. This method adds the specified amount to the months field of `LocalDate` in three steps:
1. Add the input months to the month-of-year field
2. Check if the resulting date would be invalid
3. Adjust the day-of-month to the last valid day if necessary

The difference between current behavior and propose one is in handling the last day of month in the original date. For example, adding 1 month to `2019-02-28` will produce `2019-03-28` comparing to the current implementation where the result is `2019-03-31`.

The proposed behavior is implemented in MySQL and PostgreSQL.

## How was this patch tested?

By existing test suites `DateExpressionsSuite`, `DateFunctionsSuite` and `DateTimeUtilsSuite`.

Closes apache#25153 from MaxGekk/add-months.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

- Since Spark 3.0, substitution order of nested WITH clauses is changed and an inner CTE definition takes precedence over an outer. In version 2.4 and earlier, `WITH t AS (SELECT 1), t2 AS (WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2` returns `1` while in version 3.0 it returns `2`. The previous behaviour can be restored by setting `spark.sql.legacy.ctePrecedence.enabled` to `true`.

- Since Spark 3.0, the `add_months` function adjusts the resulting date to a last day of month only if it is invalid. For example, `select add_months(DATE'2019-01-31', 1)` results `2019-02-28`. In Spark version 2.4 and earlier, the resulting date is adjusted when it is invalid, or the original date is a last day of months. For example, adding a month to `2019-02-28` resultes in `2019-03-31`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm .. shall we update this migration guide? Actually select add_months(DATE'2019-01-31', 1) returns 2019-02-28 in both the current master and the versions before Spark 2.4.x.

I think we should explicitly mention select add_months(DATE'2019-02-28', 1) case only:

scala> sql("select add_months(DATE'2019-02-28', 1)").show()
+--------------------------------+
|add_months(DATE '2019-02-28', 1)|
+--------------------------------+
|                      2019-03-31|
+--------------------------------+
scala> sql("select add_months(DATE'2019-02-28', 1)").show()
+--------------------------------+
|add_months(DATE '2019-02-28', 1)|
+--------------------------------+
|                      2019-03-28|
+--------------------------------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me update it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants