Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge remote-tracking branch 'apache/master' into SPARK-35780-full-ra…
…nge-datetime
  • Loading branch information
linhongliu-db committed Jul 12, 2021
commit dda61360cd885c490f56aaba5dfd8b384dc422d6
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ object DateTimeUtils {
// A Long is able to represent a timestamp within [+-]200 thousand years
val maxDigitsYear = 6
// For the nanosecond part, more than 6 digits is allowed, but will be truncated.
segment == 6 || (segment == 0 && digits > 0 && digits <= maxDigitsYear) ||
segment == 6 || (segment == 0 && digits >= 4 && digits <= maxDigitsYear) ||
(segment != 0 && segment != 6 && digits <= 2)
Copy link
Contributor Author

@linhongliu-db linhongliu-db Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

segments except year are allowed to have 0 digits before this PR. so I didn't do zero checks for these segments.
for example, before and after this PR, the below query is valid:

select cast('12::' as timestamp); -- output: 2021-07-07 12:00:00
select cast('T' as timestamp); -- output: 2021-07-07 00:00:00

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, we should fail it. Let's do it in another PR.

}
if (s == null || s.trimAll().numBytes() == 0) {
Expand Down Expand Up @@ -527,7 +527,7 @@ object DateTimeUtils {
def isValidDigits(segment: Int, digits: Int): Boolean = {
// An integer is able to represent a date within [+-]5 million years.
var maxDigitsYear = 7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I implement a configuration item that configures the range of digits allowed for the year?

I found that it was writing to tables in different formats and the results would behave differently.

create table t(c1 date) stored as textfile;
insert overwrite table t select cast( '22022-05-01' as date);
select * from t1; -- output null
create table t(c1 date) stored as orcfile;
insert overwrite table t select cast( '22022-05-01' as date);
select * from t1; -- output +22022-05-01

Because orc/parquet date stores integers, but textfile and sequencefile store text.

image

But if you use hive jdbc, the query will fail, because java.sql.Date only supports 4-digit years.

Caused by: java.lang.IllegalArgumentException
  at java.sql.Date.valueOf(Date.java:143)
  at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:447

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's expected that not all the data sources and BI clients support datetime values larger than 10000-01-01, the question is when the failure should happen.

It looks to me that the Hive table should fail to write 22022-05-01 with textfile source, and the hive jdbc should fail at the client-side saying 22022-05-01 is not supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I don't think it's possible to add a Spark config to forbid large datetime values. The literal is just one place, there are many other datetime operations that may produce large datetime values, which have been there before this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation, make sense.

There may be some dates that were treated as abnormal by users in previous Spark versions, and can be handled normally in Spark 3.2, although they are normal dates.
Because I didn't see this behavior change in the migration guide before noticing this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the impact on BI clients was missed, though strictly speaking BI clients are not part of Spark.

(segment == 0 && digits > 0 && digits <= maxDigitsYear) || (segment != 0 && digits <= 2)
(segment == 0 && digits >= 4 && digits <= maxDigitsYear) || (segment != 0 && digits <= 2)
}
if (s == null || s.trimAll().numBytes() == 0) {
return None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,7 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase {
s"Cannot cast $str to TimestampType.")
}

checkCastWithParseError("123")
checkCastWithParseError("2015-03-18 123142")
checkCastWithParseError("2015-03-18T123123")
checkCastWithParseError("2015-03-18X")
Expand Down Expand Up @@ -410,6 +411,7 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase {
test("SPARK-35720: cast invalid string input to timestamp without time zone") {
Seq("00:00:00",
"a",
"123",
"a2021-06-17",
"2021-06-17abc",
"2021-06-17 00:00:00ABC").foreach { invalidInput =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -569,6 +569,7 @@ class CastSuite extends CastSuiteBase {
test("SPARK-35720: cast invalid string input to timestamp without time zone") {
Seq("00:00:00",
"a",
"123",
"a2021-06-17",
"2021-06-17abc",
"2021-06-17 00:00:00ABC").foreach { invalidInput =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,6 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper {

val tz = TimeZone.getTimeZone(zid)
var c = Calendar.getInstance(tz)
checkEvaluation(
cast("123", TimestampType, Option(zid.getId)),
LocalDateTime.of(123, 1, 1, 0, 0, 0).atZone(zid).toInstant)
c.set(2015, 0, 1, 0, 0, 0)
c.set(Calendar.MILLISECOND, 0)
checkCastStringToTimestamp("2015", new Timestamp(c.getTimeInMillis))
Expand Down Expand Up @@ -977,8 +974,6 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper {
// The input string can contain date only
checkEvaluation(cast("2021-06-17", TimestampNTZType),
LocalDateTime.of(2021, 6, 17, 0, 0))
checkEvaluation(cast("123", TimestampNTZType),
LocalDateTime.of(123, 1, 1, 0, 0))
}

test("SPARK-35112: Cast string to day-time interval") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {
assert(toDate("2015.03.18").isEmpty)
assert(toDate("20150318").isEmpty)
assert(toDate("2015-031-8").isEmpty)
assert(toDate("015-03-18").isEmpty)
assert(toDate("015").isEmpty)
assert(toDate("1999 08 01").isEmpty)
assert(toDate("1999-08 01").isEmpty)
assert(toDate("1999 08").isEmpty)
Expand All @@ -151,8 +153,6 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {

test("SPARK-35780: support full range of date string") {
assert(toDate("02015-03-18").get === days(2015, 3, 18))
assert(toDate("015-03-18").get === days(15, 3, 18))
assert(toDate("015").get === days(15, 1, 1))
assert(toDate("02015").get === days(2015, 1, 1))
assert(toDate("-02015").get === days(-2015, 1, 1))
assert(toDate("999999-1-28").get === days(999999, 1, 28))
Expand Down Expand Up @@ -271,13 +271,15 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {
expected = Option(date(2011, 5, 6, 7, 8, 9, 100000, zid = zid))
checkStringToTimestamp("2011-05-06 07:08:09.1000", expected)

checkStringToTimestamp("238", None)
checkStringToTimestamp("2015-03-18 123142", None)
checkStringToTimestamp("2015-03-18T123123", None)
checkStringToTimestamp("2015-03-18X", None)
checkStringToTimestamp("2015/03/18", None)
checkStringToTimestamp("2015.03.18", None)
checkStringToTimestamp("20150318", None)
checkStringToTimestamp("2015-031-8", None)
checkStringToTimestamp("015-01-18", None)
checkStringToTimestamp("2015-03-18T12:03.17-20:0", None)
checkStringToTimestamp("2015-03-18T12:03.17-0:70", None)
checkStringToTimestamp("2015-03-18T12:03.17-1:0:0", None)
Expand Down Expand Up @@ -307,10 +309,8 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {

checkStringToTimestamp("-1969-12-31 16:00:00", Option(date(-1969, 12, 31, 16, zid = UTC)))
checkStringToTimestamp("02015-03-18 16:00:00", Option(date(2015, 3, 18, 16, zid = UTC)))
checkStringToTimestamp("015-03-18 16:00:00", Option(date(15, 3, 18, 16, zid = UTC)))
checkStringToTimestamp("000001", Option(date(1, 1, 1, 0, zid = UTC)))
checkStringToTimestamp("-000001", Option(date(-1, 1, 1, 0, zid = UTC)))
checkStringToTimestamp("238", Option(date(238, 1, 1, 0, zid = UTC)))
checkStringToTimestamp("00238", Option(date(238, 1, 1, 0, zid = UTC)))
checkStringToTimestamp("99999-03-01T12:03:17", Option(date(99999, 3, 1, 12, 3, 17, zid = UTC)))
checkStringToTimestamp("+12:12:12", None)
Expand Down
12 changes: 9 additions & 3 deletions sql/core/src/test/resources/sql-tests/inputs/datetime.sql
Original file line number Diff line number Diff line change
Expand Up @@ -258,15 +258,21 @@ select to_timestamp_ntz('2021-06-25 10:11:12') - interval '20 15' day to hour;
select to_timestamp_ntz('2021-06-25 10:11:12') - interval '20 15:40' day to minute;
select to_timestamp_ntz('2021-06-25 10:11:12') - interval '20 15:40:32.99899999' day to second;

-- timestamp numeric fields constructor
SELECT make_timestamp(2021, 07, 11, 6, 30, 45.678);
SELECT make_timestamp(2021, 07, 11, 6, 30, 60.007);

-- datetime with year outside [0000-9999]
select date'999999-03-18';
select date'015';
select date'-1-1-28';
select date'-0001-1-28';
select date'0015';
select cast('015' as date);
select cast('2021-4294967297-11' as date);

select timestamp'-1969-12-31 16:00:00';
select timestamp'015-03-18 16:00:00';
select timestamp'0015-03-18 16:00:00';
select timestamp'-000001';
select timestamp'99999-03-18T12:03:17';
select cast('4294967297' as timestamp);
select cast('2021-01-01T12:30:4294967297.123456' as timestamp);

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 203
-- Number of queries: 206


-- !query
Expand Down Expand Up @@ -1642,6 +1642,23 @@ struct<to_timestamp_ntz(2021-06-25 10:11:12) - INTERVAL '20 15:40:32.998999' DAY
2021-06-04 18:30:39.001001


-- !query
SELECT make_timestamp(2021, 07, 11, 6, 30, 45.678)
-- !query schema
struct<make_timestamp(2021, 7, 11, 6, 30, 45.678):timestamp>
-- !query output
2021-07-11 06:30:45.678


-- !query
SELECT make_timestamp(2021, 07, 11, 6, 30, 60.007)
-- !query schema
struct<>
-- !query output
java.time.DateTimeException
The fraction of sec must be zero. Valid range is [0, 60].


-- !query
select date'999999-03-18'
-- !query schema
Expand All @@ -1651,19 +1668,27 @@ struct<DATE '+999999-03-18':date>


-- !query
select date'015'
select date'-0001-1-28'
-- !query schema
struct<DATE '-0001-01-28':date>
-- !query output
-0001-01-28


-- !query
select date'0015'
-- !query schema
struct<DATE '0015-01-01':date>
-- !query output
0015-01-01


-- !query
select date'-1-1-28'
select cast('015' as date)
-- !query schema
struct<DATE '-0001-01-28':date>
struct<CAST(015 AS DATE):date>
-- !query output
-0001-01-28
0015-01-01


-- !query
Expand All @@ -1684,7 +1709,7 @@ struct<TIMESTAMP '-1969-12-31 16:00:00':timestamp>


-- !query
select timestamp'015-03-18 16:00:00'
select timestamp'0015-03-18 16:00:00'
-- !query schema
struct<TIMESTAMP '0015-03-18 16:00:00':timestamp>
-- !query output
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 203
-- Number of queries: 206


-- !query
Expand Down Expand Up @@ -1586,6 +1586,22 @@ struct<to_timestamp_ntz(2021-06-25 10:11:12) - INTERVAL '20 15:40:32.998999' DAY
2021-06-04 18:30:39.001001


-- !query
SELECT make_timestamp(2021, 07, 11, 6, 30, 45.678)
-- !query schema
struct<make_timestamp(2021, 7, 11, 6, 30, 45.678):timestamp>
-- !query output
2021-07-11 06:30:45.678


-- !query
SELECT make_timestamp(2021, 07, 11, 6, 30, 60.007)
-- !query schema
struct<make_timestamp(2021, 7, 11, 6, 30, 60.007):timestamp>
-- !query output
NULL


-- !query
select date'999999-03-18'
-- !query schema
Expand All @@ -1595,19 +1611,27 @@ struct<DATE '999999-03-18':date>


-- !query
select date'015'
select date'-0001-1-28'
-- !query schema
struct<DATE '0002-01-28':date>
-- !query output
0002-01-28


-- !query
select date'0015'
-- !query schema
struct<DATE '0015-01-01':date>
-- !query output
0015-01-01


-- !query
select date'-1-1-28'
select cast('015' as date)
-- !query schema
struct<DATE '0002-01-28':date>
struct<CAST(015 AS DATE):date>
-- !query output
0002-01-28
0015-01-01


-- !query
Expand All @@ -1627,7 +1651,7 @@ struct<TIMESTAMP '-1969-12-31 16:00:00':timestamp>


-- !query
select timestamp'015-03-18 16:00:00'
select timestamp'0015-03-18 16:00:00'
-- !query schema
struct<TIMESTAMP '0015-03-18 16:00:00':timestamp>
-- !query output
Expand Down
36 changes: 30 additions & 6 deletions sql/core/src/test/resources/sql-tests/results/datetime.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 203
-- Number of queries: 206


-- !query
Expand Down Expand Up @@ -1594,6 +1594,22 @@ struct<to_timestamp_ntz(2021-06-25 10:11:12) - INTERVAL '20 15:40:32.998999' DAY
2021-06-04 18:30:39.001001


-- !query
SELECT make_timestamp(2021, 07, 11, 6, 30, 45.678)
-- !query schema
struct<make_timestamp(2021, 7, 11, 6, 30, 45.678):timestamp>
-- !query output
2021-07-11 06:30:45.678


-- !query
SELECT make_timestamp(2021, 07, 11, 6, 30, 60.007)
-- !query schema
struct<make_timestamp(2021, 7, 11, 6, 30, 60.007):timestamp>
-- !query output
NULL


-- !query
select date'999999-03-18'
-- !query schema
Expand All @@ -1603,19 +1619,27 @@ struct<DATE '+999999-03-18':date>


-- !query
select date'015'
select date'-0001-1-28'
-- !query schema
struct<DATE '-0001-01-28':date>
-- !query output
-0001-01-28


-- !query
select date'0015'
-- !query schema
struct<DATE '0015-01-01':date>
-- !query output
0015-01-01


-- !query
select date'-1-1-28'
select cast('015' as date)
-- !query schema
struct<DATE '-0001-01-28':date>
struct<CAST(015 AS DATE):date>
-- !query output
-0001-01-28
0015-01-01


-- !query
Expand All @@ -1635,7 +1659,7 @@ struct<TIMESTAMP '-1969-12-31 16:00:00':timestamp>


-- !query
select timestamp'015-03-18 16:00:00'
select timestamp'0015-03-18 16:00:00'
-- !query schema
struct<TIMESTAMP '0015-03-18 16:00:00':timestamp>
-- !query output
Expand Down
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.