-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28389][SQL] Use Java 8 API in add_months #25153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan Please, take a look at the PR. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I favor this change as less surprising and more consistent with Java APIs too.
docs/sql-migration-guide-upgrade.md
Outdated
|
|
||
| - Since Spark 3.0, substitution order of nested WITH clauses is changed and an inner CTE definition takes precedence over an outer. In version 2.4 and earlier, `WITH t AS (SELECT 1), t2 AS (WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2` returns `1` while in version 3.0 it returns `2`. The previous behaviour can be restored by setting `spark.sql.legacy.ctePrecedence.enabled` to `true`. | ||
|
|
||
| - Since Spark 3.0, the `add_months` function does not adjust the resulting date to a last day of month if the original date is a last day of month. The resulting date is adjust to a last day of month only if it is invalid. For example, `select add_months(DATE'2019-02-28', 1)` produces `2019-03-28` but `select add_months(DATE'2019-01-31', 1)` produces `2019-02-28`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is adjust -> is adjusted.
So previously, adding a month to 2019-02-28 resulted in 2019-03-31? It might be worth clarifying that too in your example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So previously, adding a month to 2019-02-28 resulted in 2019-03-31?
Yes, it does:
scala> spark.sql("select add_months(DATE'2019-02-28', 1)").show
+--------------------------------+
|add_months(DATE '2019-02-28', 1)|
+--------------------------------+
| 2019-03-31|
+--------------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the SQL migration guide, and add the example.
|
Test build #107649 has finished for PR 25153 at commit
|
|
The latest build failed on because current implementation incrementally adds steps of month and micros to previous value. So, such implementation produces: I am going to change this in |
|
Test build #107652 has finished for PR 25153 at commit
|
Is this a valid test? Do other databases have the same behavior? |
|
@cloud-fan This feature was borrowed from prestodb #21155 . Looking at the issue prestodb/presto#10765 , the expected values in the test are valid, it seems. |
|
thanks, merging to master! |
## What changes were proposed in this pull request? In the PR, I propose to use the `plusMonths()` method of `LocalDate` to add months to a date. This method adds the specified amount to the months field of `LocalDate` in three steps: 1. Add the input months to the month-of-year field 2. Check if the resulting date would be invalid 3. Adjust the day-of-month to the last valid day if necessary The difference between current behavior and propose one is in handling the last day of month in the original date. For example, adding 1 month to `2019-02-28` will produce `2019-03-28` comparing to the current implementation where the result is `2019-03-31`. The proposed behavior is implemented in MySQL and PostgreSQL. ## How was this patch tested? By existing test suites `DateExpressionsSuite`, `DateFunctionsSuite` and `DateTimeUtilsSuite`. Closes apache#25153 from MaxGekk/add-months. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
|
||
| - Since Spark 3.0, substitution order of nested WITH clauses is changed and an inner CTE definition takes precedence over an outer. In version 2.4 and earlier, `WITH t AS (SELECT 1), t2 AS (WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2` returns `1` while in version 3.0 it returns `2`. The previous behaviour can be restored by setting `spark.sql.legacy.ctePrecedence.enabled` to `true`. | ||
|
|
||
| - Since Spark 3.0, the `add_months` function adjusts the resulting date to a last day of month only if it is invalid. For example, `select add_months(DATE'2019-01-31', 1)` results `2019-02-28`. In Spark version 2.4 and earlier, the resulting date is adjusted when it is invalid, or the original date is a last day of months. For example, adding a month to `2019-02-28` resultes in `2019-03-31`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm .. shall we update this migration guide? Actually select add_months(DATE'2019-01-31', 1) returns 2019-02-28 in both the current master and the versions before Spark 2.4.x.
I think we should explicitly mention select add_months(DATE'2019-02-28', 1) case only:
scala> sql("select add_months(DATE'2019-02-28', 1)").show()
+--------------------------------+
|add_months(DATE '2019-02-28', 1)|
+--------------------------------+
| 2019-03-31|
+--------------------------------+
scala> sql("select add_months(DATE'2019-02-28', 1)").show()
+--------------------------------+
|add_months(DATE '2019-02-28', 1)|
+--------------------------------+
| 2019-03-28|
+--------------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me update it soon.

What changes were proposed in this pull request?
In the PR, I propose to use the
plusMonths()method ofLocalDateto add months to a date. This method adds the specified amount to the months field ofLocalDatein three steps:The difference between current behavior and propose one is in handling the last day of month in the original date. For example, adding 1 month to
2019-02-28will produce2019-03-28comparing to the current implementation where the result is2019-03-31.The proposed behavior is implemented in MySQL and PostgreSQL.
How was this patch tested?
By existing test suites
DateExpressionsSuite,DateFunctionsSuiteandDateTimeUtilsSuite.