Skip to content
Closed
Original file line number Diff line number Diff line change
Expand Up @@ -409,9 +409,12 @@ object IntervalUtils {
}

private object ParseState extends Enumeration {
type ParseState = Value

val PREFIX,
BEGIN_VALUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can rename it to TRIM_BEFORE_PARSE_VALUE

Copy link
Member Author

@yaooqinn yaooqinn Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we may change PARSE_UNIT_VALUE to PARSE_VALUE too

How about renaming them all as,

    val PREFIX,
        BODY,
        SIGN,
        TRIM_BEFORE_VALUE,
        VALUE,
        VALUE_FRACTIONAL_PART,
        TRIM_BEFORE_UNIT,
        UNIT_BEGIN,
        UNIT_SUFFIX,
        UNIT_END = Value

Seems loud and clear enough for each state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does BODY mean? others LGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval (PREFIX, BODY)
BODY (SIGN, VALUE , UNIT)+

Seems not persuadable enough

Copy link
Member Author

@yaooqinn yaooqinn Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or NEXT_VALUE_UNIT? NEXT_VALUE_UNIT_PAIR

PARSE_SIGN,
TRIM_VALUE,
Copy link
Contributor

@cloud-fan cloud-fan Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the name clearer? e.g. TRIM_BEFORE_PARSE_UNIT_VALUE

PARSE_UNIT_VALUE,
FRACTIONAL_PART,
BEGIN_UNIT_NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BEGIN_UNIT_NAME we also trim the spaces. Can we have a consistent way to do it? Now there are 2 ways:

  1. have an intermedia state to trim the spaces
  2. trim the spaces at the beginning of a state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a TRIM_UNIT state to unify these 2 ways

Expand Down Expand Up @@ -439,7 +442,7 @@ object IntervalUtils {
val s = input.trim.toLowerCase
// scalastyle:on
val bytes = s.getBytes
if (bytes.length == 0) {
if (bytes.isEmpty) {
return null
}
var state = PREFIX
Expand All @@ -452,6 +455,13 @@ object IntervalUtils {
var fractionScale: Int = 0
var fraction: Int = 0

def trimToNextState(b: Byte, next: ParseState): Unit = {
b match {
case ' ' => i += 1
case _ => state = next
}
}

while (i < bytes.length) {
val b = bytes(i)
state match {
Expand All @@ -464,11 +474,7 @@ object IntervalUtils {
}
}
state = BEGIN_VALUE
case BEGIN_VALUE =>
b match {
case ' ' => i += 1
case _ => state = PARSE_SIGN
}
case BEGIN_VALUE => trimToNextState(b, PARSE_SIGN)
case PARSE_SIGN =>
b match {
case '-' =>
Expand All @@ -486,7 +492,8 @@ object IntervalUtils {
// Sets the scale to an invalid value to track fraction presence
// in the BEGIN_UNIT_NAME state
fractionScale = -1
state = PARSE_UNIT_VALUE
state = TRIM_VALUE
case TRIM_VALUE => trimToNextState(b, PARSE_UNIT_VALUE)
case PARSE_UNIT_VALUE =>
b match {
case _ if '0' <= b && b <= '9' =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ class IntervalUtilsSuite extends SparkFunSuite {
"-1 MONTH 1 day -1 microseconds" -> new CalendarInterval(-1, 1, -1),
" 123 MONTHS 123 DAYS 123 Microsecond " -> new CalendarInterval(123, 123, 123),
"interval -1 day +3 Microseconds" -> new CalendarInterval(0, -1, 3),
"interval - 1 day + 3 Microseconds" -> new CalendarInterval(0, -1, 3),
" interval 8 years -11 months 123 weeks -1 day " +
"23 hours -22 minutes 1 second -123 millisecond 567 microseconds " ->
new CalendarInterval(85, 860, 81480877567L)).foreach { case (input, expected) =>
Expand Down
48 changes: 26 additions & 22 deletions sql/core/benchmarks/IntervalBenchmark-results.txt
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.14.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So old jdk? ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and regen-ed with both new jdk8 and jdk11, thanks.

Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
cast strings to intervals: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
prepare string w/ interval 422 437 16 2.4 421.8 1.0X
prepare string w/o interval 369 374 8 2.7 369.4 1.1X
1 units w/ interval 426 430 5 2.3 425.5 1.0X
1 units w/o interval 382 386 5 2.6 382.1 1.1X
2 units w/ interval 519 527 9 1.9 518.5 0.8X
2 units w/o interval 505 512 6 2.0 505.4 0.8X
3 units w/ interval 650 653 3 1.5 649.6 0.6X
3 units w/o interval 630 633 4 1.6 629.7 0.7X
4 units w/ interval 755 761 6 1.3 754.9 0.6X
4 units w/o interval 745 749 3 1.3 745.3 0.6X
5 units w/ interval 882 891 14 1.1 882.0 0.5X
5 units w/o interval 867 870 3 1.2 867.4 0.5X
6 units w/ interval 1008 1013 4 1.0 1008.2 0.4X
6 units w/o interval 990 995 5 1.0 990.4 0.4X
7 units w/ interval 1057 1063 6 0.9 1056.9 0.4X
7 units w/o interval 1042 1046 4 1.0 1042.3 0.4X
8 units w/ interval 1206 1208 2 0.8 1206.0 0.3X
8 units w/o interval 1194 1198 4 0.8 1194.1 0.4X
9 units w/ interval 1322 1324 3 0.8 1321.5 0.3X
9 units w/o interval 1314 1318 4 0.8 1313.6 0.3X
prepare string w/ interval 483 562 68 2.1 483.4 1.0X
prepare string w/o interval 444 451 8 2.3 443.7 1.1X
1 units w/ interval 481 508 25 2.1 480.6 1.0X
1 units w/o interval 428 453 22 2.3 427.7 1.1X
2 units w/ interval 613 664 47 1.6 613.2 0.8X
2 units w/o interval 598 632 32 1.7 597.9 0.8X
3 units w/ interval 1127 1159 35 0.9 1127.5 0.4X
3 units w/o interval 1116 1131 18 0.9 1116.2 0.4X
4 units w/ interval 1263 1276 12 0.8 1263.3 0.4X
4 units w/o interval 1288 1310 29 0.8 1288.0 0.4X
5 units w/ interval 1391 1411 25 0.7 1391.1 0.3X
5 units w/o interval 1384 1405 20 0.7 1384.2 0.3X
6 units w/ interval 1528 1537 13 0.7 1527.7 0.3X
6 units w/o interval 1550 1550 1 0.6 1549.6 0.3X
7 units w/ interval 1789 1797 12 0.6 1789.1 0.3X
7 units w/o interval 1756 1851 83 0.6 1755.8 0.3X
8 units w/ interval 1860 1867 10 0.5 1859.7 0.3X
8 units w/o interval 1848 1853 6 0.5 1848.5 0.3X
9 units w/ interval 1928 1938 9 0.5 1928.0 0.3X
9 units w/o interval 1918 1924 5 0.5 1918.3 0.3X
10 units w/ interval 2078 2088 10 0.5 2078.3 0.2X
10 units w/o interval 2073 2084 20 0.5 2072.7 0.2X
11 units w/ interval 2207 2221 12 0.5 2207.2 0.2X
11 units w/o interval 2213 2223 16 0.5 2213.4 0.2X

9 changes: 9 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/interval.sql
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,12 @@ select max(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 se

-- min
select min(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 seconds') t(v);

-- SPARK-29605: cast string to intervals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SPARK-29822 because this is newly added test coverage for this PR SPARK-29822.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I just rm it from here, thanks

select cast(v as interval) from values ('1 second') t(v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: select cast('1 second' as interval)

select cast(v as interval) from values ('+1 second') t(v);
select cast(v as interval) from values ('-1 second') t(v);
select cast(v as interval) from values ('+ 1 second') t(v);
select cast(v as interval) from values ('- 1 second') t(v);
select cast(v as interval) from values ('- -1 second') t(v);
select cast(v as interval) from values ('- +1 second') t(v);
58 changes: 57 additions & 1 deletion sql/core/src/test/resources/sql-tests/results/interval.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 22
-- Number of queries: 29


-- !query 0
Expand Down Expand Up @@ -178,3 +178,59 @@ select min(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 se
struct<min(CAST(v AS INTERVAL)):interval>
-- !query 21 output
1 seconds


-- !query 22
select cast(v as interval) from values ('1 second') t(v)
-- !query 22 schema
struct<v:interval>
-- !query 22 output
1 seconds


-- !query 23
select cast(v as interval) from values ('+1 second') t(v)
-- !query 23 schema
struct<v:interval>
-- !query 23 output
1 seconds


-- !query 24
select cast(v as interval) from values ('-1 second') t(v)
-- !query 24 schema
struct<v:interval>
-- !query 24 output
-1 seconds


-- !query 25
select cast(v as interval) from values ('+ 1 second') t(v)
-- !query 25 schema
struct<v:interval>
-- !query 25 output
1 seconds


-- !query 26
select cast(v as interval) from values ('- 1 second') t(v)
-- !query 26 schema
struct<v:interval>
-- !query 26 output
-1 seconds


-- !query 27
select cast(v as interval) from values ('- -1 second') t(v)
-- !query 27 schema
struct<v:interval>
-- !query 27 output
NULL


-- !query 28
select cast(v as interval) from values ('- +1 second') t(v)
-- !query 28 schema
struct<v:interval>
-- !query 28 output
NULL
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ object IntervalBenchmark extends SqlBasedBenchmark {
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
val N = 1000000
val timeUnits = Seq(
"13 months", "100 weeks", "9 days", "12 hours",
"13 months", " 1 months",
"100 weeks", "9 days", "12 hours", "- 3 hours",
"5 minutes", "45 seconds", "123 milliseconds", "567 microseconds")
val intervalToTest = ListBuffer[String]()

Expand Down