Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
SPARK-31710: TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_…
…MICROSECONDS to timestamp transfer
  • Loading branch information
TJX2014 committed May 22, 2020
commit 5e60ec5047d99efae061f4c716f33cb2146f78ed
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,9 @@ object FunctionRegistry {
expression[MakeInterval]("make_interval"),
expression[DatePart]("date_part"),
expression[Extract]("extract"),
expression[SecondsToTimestamp]("timestamp_seconds"),
expression[MilliSecondsToTimestamp]("timestamp_milliseconds"),
expression[MicroSecondsToTimestamp]("timestamp_microseconds"),
Copy link
Member

@HyukjinKwon HyukjinKwon May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to have expressions presumably to get away from cast(ts as long) behaviour which is not invasive. But it would be more interesting to see how other DBMSes solved this problem. From arbitrary googling, vendors have different approaches.

Seems like BigQuery has these three functions (https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions?hl=en#timestamp_seconds). Shall we match the naming at least?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't realize it's MILLIS not MILLISECONDS. Let's follow the short name.

Copy link
Contributor Author

@TJX2014 TJX2014 May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, is it means we shall rename MilliSecondsToTimestamp to MillisToTimestamp and MicroSecondsToTimestamp to MicrosToTimestamp. I will correct it if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but yeah let's do that.


// collection functions
expression[CreateArray]("array"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,92 @@ case class DayOfYear(child: Expression) extends UnaryExpression with ImplicitCas
}
}

@ExpressionDescription(
usage = "_FUNC_(date) - Returns timestamp from seconds.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_FUNC_(seconds) - Creates timestamp from the number of seconds since UTC epoch.

examples = """
Examples:
> SELECT _FUNC_(1230219000);
"2008-12-25 07:30:00.0"
""",
group = "datetime_funcs",
since = "3.1.0")
case class SecondsToTimestamp(child: Expression)
extends NumberToTimestampBase {

override def upScaleFactor: SQLTimestamp = MICROS_PER_SECOND

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that factor really a timestamp? That seems weird, it's just a Long in the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, We need just a param, so look around I found the downScaleFactor , so the upScaleFactor is used, Could you please give any suggest, could it be upScaleParam?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean that the type should be Long, just like the base class:

Suggested change
override def upScaleFactor: SQLTimestamp = MICROS_PER_SECOND
override def upScaleFactor: Long = MICROS_PER_SECOND

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will correct it, good job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


override def prettyName: String = "timestamp_seconds"
}

@ExpressionDescription(
usage = "_FUNC_(date) - Returns timestamp from milliseconds.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_FUNC_(seconds) - Creates timestamp from the number of milliseconds since UTC epoch.

examples = """
Examples:
> SELECT _FUNC_(1230219000000);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include precision in the example, e.g.

Suggested change
> SELECT _FUNC_(1230219000000);
> SELECT _FUNC_(12302190000123);
"2008-12-25 07:30:00.123"

Same for micros below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, datetime.sql is also syncronized.

"2008-12-25 07:30:00.0"
""",
group = "datetime_funcs",
since = "3.1.0")
case class MilliSecondsToTimestamp(child: Expression)
extends NumberToTimestampBase {

override def upScaleFactor: SQLTimestamp = MICROS_PER_MILLIS

override def prettyName: String = "timestamp_milliseconds"
}

@ExpressionDescription(
usage = "_FUNC_(date) - Returns timestamp from microseconds.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

examples = """
Examples:
> SELECT _FUNC_(1230219000000000);
"2008-12-25 07:30:00.0"
""",
group = "datetime_funcs",
since = "3.1.0")
case class MicroSecondsToTimestamp(child: Expression)
extends NumberToTimestampBase {

override def upScaleFactor: SQLTimestamp = 1L

override def prettyName: String = "timestamp_microseconds"
}

abstract class NumberToTimestampBase extends UnaryExpression

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a readability perspective it may be better to put the base class before the subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

with ImplicitCastInputTypes{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a space before {


protected def upScaleFactor: Long

override def inputTypes: Seq[AbstractDataType] = Seq(LongType, IntegerType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should accept LongType only. The analyzer will cast the input to long for you, if possible.


override def dataType: DataType = TimestampType

override def eval(input: InternalRow): Any = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can override nullSafeEval, to skip the null check

val t = child.eval(input)
if (t == null) {
null
} else {
child.dataType match {
case IntegerType =>
Math.multiplyExact(t.asInstanceOf[Int].toLong, upScaleFactor)
case LongType =>
Math.multiplyExact(t.asInstanceOf[Long], upScaleFactor)
}
}
}

override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
child.dataType match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

case IntegerType =>
defineCodeGen(ctx, ev, _ => s"java.lang.Math.multiplyExact(" +
s"${ev.value.asInstanceOf[Integer].toLong}, ${upScaleFactor})")
case LongType =>
defineCodeGen(ctx, ev, _ => s"java.lang.Math.multiplyExact(" +
s"${ev.value.asInstanceOf[Long]}, ${upScaleFactor})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ev.value, otherwise you get class cast exception, as ev.value is a string.

}
}
}

@ExpressionDescription(
usage = "_FUNC_(date) - Returns the year component of the date/timestamp.",
examples = """
Expand Down
22 changes: 22 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -3495,6 +3495,28 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
assert(df4.schema.head.name === "randn(1)")
checkIfSeedExistsInExplain(df2)
}

test("SPARK-31710: " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the tests to datetime.sql

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we remove it? I don't think it provides more test coverage than datetime.sql

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we shall remove it because it has the same coverage as datetime.sql.

"TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer") {
val df1 = sql("select TIMESTAMP_SECONDS(1230219000) as timestamp")
checkAnswer(df1, Row(Timestamp.valueOf("2008-12-25 07:30:00.0")))

val df2 = sql("select TIMESTAMP_MILLISECONDS(1230219000000) as timestamp")
checkAnswer(df2, Row(Timestamp.valueOf("2008-12-25 07:30:00.0")))

val df3 = sql("select TIMESTAMP_MICROSECONDS(1230219000000000) as timestamp")
checkAnswer(df3, Row(Timestamp.valueOf("2008-12-25 07:30:00.0")))

val df4 = sql("select TIMESTAMP_SECONDS(-1230219000) as timestamp")
checkAnswer(df4, Row(Timestamp.valueOf("1931-01-07 00:30:00.0")))

val df5 = sql("select TIMESTAMP_MILLISECONDS(-1230219000000) as timestamp")
checkAnswer(df5, Row(Timestamp.valueOf("1931-01-07 00:30:00.0")))

val df6 = sql("select TIMESTAMP_MICROSECONDS(-1230219000000000) as timestamp")
checkAnswer(df6, Row(Timestamp.valueOf("1931-01-07 00:30:00.0")))

}
}

case class Foo(bar: Option[String])