Skip to content
Merged

sync #10

Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
[SPARK-31879][SQL] Using GB as default Locale for datetime formatters
# What changes were proposed in this pull request?

This PR switches the default Locale from the `US` to `GB` to change the behavior of the first day of the week from Sunday-started to Monday-started as same as v2.4

### Why are the changes needed?

#### cases
```sql
spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u');
2019-12-29 00:00:00
spark-sql> set spark.sql.legacy.timeParserPolicy=legacy;
spark.sql.legacy.timeParserPolicy	legacy
spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u');
2019-12-30 00:00:00
```

#### reasons

These week-based fields need Locale to express their semantics, the first day of the week varies from country to country.

From the Java doc of WeekFields
```java
    /**
     * Gets the first day-of-week.
     * <p>
     * The first day-of-week varies by culture.
     * For example, the US uses Sunday, while France and the ISO-8601 standard use Monday.
     * This method returns the first day using the standard {code DayOfWeek} enum.
     *
     * return the first day-of-week, not null
     */
    public DayOfWeek getFirstDayOfWeek() {
        return firstDayOfWeek;
    }
```

But for the SimpleDateFormat, the day-of-week is not localized

```
u	Day number of week (1 = Monday, ..., 7 = Sunday)	Number	1
```

Currently, the default locale we use is the US, so the result moved a day backward.

For other countries, please refer to [First Day of the Week in Different Countries](http://chartsbin.com/view/41671)

With this change, it restores the first day of week calculating for functions when using the default locale.

### Does this PR introduce _any_ user-facing change?

Yes, but the behavior change is used to restore the old one of v2.4

### How was this patch tested?

add unit tests

Closes apache#28692 from yaooqinn/SPARK-31879.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
yaooqinn authored and cloud-fan committed Jun 3, 2020
commit c59f51bcc207725b8cbc4201df9367f874f5915c
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,13 @@ class LegacySimpleDateFormatter(pattern: String, locale: Locale) extends LegacyD
object DateFormatter {
import LegacyDateFormats._

val defaultLocale: Locale = Locale.US
/**
* Before Spark 3.0, the first day-of-week is always Monday. Since Spark 3.0, it depends on the
* locale.
* We pick GB as the default locale instead of US, to be compatible with Spark 2.x, as US locale
* uses Sunday as the first day-of-week. See SPARK-31879.
*/
val defaultLocale: Locale = new Locale("en", "GB")

val defaultPattern: String = "yyyy-MM-dd"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,13 @@ object LegacyDateFormats extends Enumeration {
object TimestampFormatter {
import LegacyDateFormats._

val defaultLocale: Locale = Locale.US
/**
* Before Spark 3.0, the first day-of-week is always Monday. Since Spark 3.0, it depends on the
* locale.
* We pick GB as the default locale instead of US, to be compatible with Spark 2.x, as US locale
* uses Sunday as the first day-of-week. See SPARK-31879.
*/
val defaultLocale: Locale = new Locale("en", "GB")

def defaultPattern(): String = s"${DateFormatter.defaultPattern} HH:mm:ss"

Expand Down
4 changes: 4 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/datetime.sql
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,7 @@ select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy
select from_unixtime(1, 'yyyyyyyyyyy-MM-dd');
select date_format(timestamp '2018-11-17 13:33:33', 'yyyyyyyyyy-MM-dd HH:mm:ss');
select date_format(date '2018-11-17', 'yyyyyyyyyyy-MM-dd');

-- SPARK-31879: the first day of week
select date_format('2020-01-01', 'YYYY-MM-dd uu');
select date_format('2020-01-01', 'YYYY-MM-dd uuuu');
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 119
-- Number of queries: 121


-- !query
Expand Down Expand Up @@ -1025,3 +1025,19 @@ struct<>
-- !query output
org.apache.spark.SparkUpgradeException
You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'yyyyyyyyyyy-MM-dd' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html


-- !query
select date_format('2020-01-01', 'YYYY-MM-dd uu')
-- !query schema
struct<date_format(CAST(2020-01-01 AS TIMESTAMP), YYYY-MM-dd uu):string>
-- !query output
2020-01-01 03


-- !query
select date_format('2020-01-01', 'YYYY-MM-dd uuuu')
-- !query schema
struct<date_format(CAST(2020-01-01 AS TIMESTAMP), YYYY-MM-dd uuuu):string>
-- !query output
2020-01-01 Wednesday
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 119
-- Number of queries: 121


-- !query
Expand Down Expand Up @@ -980,3 +980,19 @@ select date_format(date '2018-11-17', 'yyyyyyyyyyy-MM-dd')
struct<date_format(CAST(DATE '2018-11-17' AS TIMESTAMP), yyyyyyyyyyy-MM-dd):string>
-- !query output
00000002018-11-17


-- !query
select date_format('2020-01-01', 'YYYY-MM-dd uu')
-- !query schema
struct<date_format(CAST(2020-01-01 AS TIMESTAMP), YYYY-MM-dd uu):string>
-- !query output
2020-01-01 03


-- !query
select date_format('2020-01-01', 'YYYY-MM-dd uuuu')
-- !query schema
struct<date_format(CAST(2020-01-01 AS TIMESTAMP), YYYY-MM-dd uuuu):string>
-- !query output
2020-01-01 0003
18 changes: 17 additions & 1 deletion sql/core/src/test/resources/sql-tests/results/datetime.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 119
-- Number of queries: 121


-- !query
Expand Down Expand Up @@ -997,3 +997,19 @@ struct<>
-- !query output
org.apache.spark.SparkUpgradeException
You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'yyyyyyyyyyy-MM-dd' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html


-- !query
select date_format('2020-01-01', 'YYYY-MM-dd uu')
-- !query schema
struct<date_format(CAST(2020-01-01 AS TIMESTAMP), YYYY-MM-dd uu):string>
-- !query output
2020-01-01 03


-- !query
select date_format('2020-01-01', 'YYYY-MM-dd uuuu')
-- !query schema
struct<date_format(CAST(2020-01-01 AS TIMESTAMP), YYYY-MM-dd uuuu):string>
-- !query output
2020-01-01 Wednesday