Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add DOW/dayofweek
  • Loading branch information
wangyum committed Jun 1, 2018
commit c9d2bc348495669bd4347679547f1437f35367f1
Original file line number Diff line number Diff line change
Expand Up @@ -592,7 +592,7 @@ primaryExpression
| identifier #columnReference
| base=primaryExpression '.' fieldName=identifier #dereference
| '(' expression ')' #parenthesizedExpression
| EXTRACT '(' field=(YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND) FROM source=valueExpression ')' #extract
| EXTRACT '(' field=identifier FROM source=valueExpression ')' #extract
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon @maropu @wangyum @huaxingao Just realized EXTRACT is not included in https://spark.apache.org/docs/latest/api/sql/index.html Could we fix it in the upcoming built-in function doc page updates?

Copy link
Member

@maropu maropu Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Nice catch! The python script that we are now working on (#28224) just dumps the entries of ExpressionDescription(ExpressionInfo), so the output unfortunately cannot include a doc entry for EXTRACT now. To document it, there are the three options that I can think of;

  • (the simplest fix) Add some description about EXTRACT in the SELECT syntax page (e.g., the named_expression section), then add a link to date_part in the built-in function page.

  • Add a dummy ExpressionDescription for EXTRACT like this;

@ExpressionDescription(
  usage = "_FUNC_(field FROM source) - Extracts a part of the date/timestamp or interval source.",
  arguments = """ ... """,
  examples = """
    Examples:
      > SELECT _FUNC_(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456');
       2019
  """,
  since = "3.0.0")
case class Extract(...) extends DatePart(field, source, child)
  • Add a new entry for an alias name in ExpressionDescription like this;
@ExpressionDescription(
  usage = "_FUNC_(field FROM source) - Extracts a part of the date/timestamp or interval source.",
  arguments = """... """,
  alias = "extract",
  examples = """
    Examples:
      ...
      > SELECT _FUNC_ALIAS_(seconds FROM interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
       30.001001
  """,
  since = "3.0.0")
case class DatePart(...) extends RuntimeReplaceable {

Which one is preferred, or any other smarter idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXTRACT is not an alias as it has different syntax. The second approach looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the check, @cloud-fan. ok, I'll open a PR to follow that approach.

;

constant
Expand Down Expand Up @@ -740,7 +740,7 @@ nonReserved
| VIEW | REPLACE
| IF
| POSITION
| EXTRACT | YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND
| EXTRACT | YEAR | QUARTER | MONTH | WEEK | DAY | DOW | HOUR | MINUTE | SECOND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove each term except for EXTRACT.

| NO | DATA
| START | TRANSACTION | COMMIT | ROLLBACK | IGNORE
| SORT | CLUSTER | DISTRIBUTE | UNSET | TBLPROPERTIES | SKEWED | STORED | DIRECTORIES | LOCATION
Expand Down Expand Up @@ -886,6 +886,7 @@ QUARTER: 'QUARTER';
MONTH: 'MONTH';
WEEK: 'WEEK';
DAY: 'DAY';
DOW: 'DOW';
HOUR: 'HOUR';
MINUTE: 'MINUTE';
SECOND: 'SECOND';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1210,23 +1210,34 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
* Create a Extract expression.
*/
override def visitExtract(ctx: ExtractContext): Expression = withOrigin(ctx) {
ctx.field.getType match {
case SqlBaseParser.YEAR =>
Year(expression(ctx.source))
case SqlBaseParser.QUARTER =>
Quarter(expression(ctx.source))
case SqlBaseParser.MONTH =>
Month(expression(ctx.source))
case SqlBaseParser.WEEK =>
WeekOfYear(expression(ctx.source))
case SqlBaseParser.DAY =>
DayOfMonth(expression(ctx.source))
case SqlBaseParser.HOUR =>
Hour(expression(ctx.source))
case SqlBaseParser.MINUTE =>
Minute(expression(ctx.source))
case SqlBaseParser.SECOND =>
Second(expression(ctx.source))
val extractType = ctx.field.getText.toUpperCase(Locale.ROOT)
try {
extractType match {
case "YEAR" =>
Year(expression(ctx.source))
case "QUARTER" =>
Quarter(expression(ctx.source))
case "MONTH" =>
Month(expression(ctx.source))
case "WEEK" =>
WeekOfYear(expression(ctx.source))
case "DAY" =>
DayOfMonth(expression(ctx.source))
case "DOW" =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DAYOFWEEK" ?

DayOfWeek(expression(ctx.source))
case "HOUR" =>
Hour(expression(ctx.source))
case "MINUTE" =>
Minute(expression(ctx.source))
case "SECOND" =>
Second(expression(ctx.source))
case other =>
throw new ParseException(s"Literals of type '$other' are currently not supported.", ctx)
}
} catch {
case e: IllegalArgumentException =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this try-catch?

val message = Option(e.getMessage).getOrElse(s"Exception parsing $extractType")
throw new ParseException(message, ctx)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ class TableIdentifierParserSuite extends SparkFunSuite {
"true", "truncate", "update", "user", "values", "with", "regexp", "rlike",
"bigint", "binary", "boolean", "current_date", "current_timestamp", "date", "double", "float",
"int", "smallint", "timestamp", "at", "position", "both", "leading", "trailing",
"extract", "year", "quarter", "month", "week", "day", "hour", "minute", "second")
"extract", "year", "quarter", "month", "week", "day", "dow", "hour", "minute", "second")

val hiveStrictNonReservedKeyword = Seq("anti", "full", "inner", "left", "semi", "right",
"natural", "union", "intersect", "except", "database", "on", "join", "cross", "select", "from",
Expand Down
4 changes: 4 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/extract.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,12 @@ select extract(week from c) from t;

select extract(day from c) from t;

select extract(dow from c) from t;

select extract(hour from c) from t;

select extract(minute from c) from t;

select extract(second from c) from t;

select extract(not_supported from c) from t;
40 changes: 31 additions & 9 deletions sql/core/src/test/resources/sql-tests/results/extract.sql.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 9
-- Number of queries: 11


-- !query 0
Expand Down Expand Up @@ -51,24 +51,46 @@ struct<dayofmonth(CAST(c AS DATE)):int>


-- !query 6
select extract(hour from c) from t
select extract(dow from c) from t
-- !query 6 schema
struct<hour(CAST(c AS TIMESTAMP)):int>
struct<dayofweek(CAST(c AS DATE)):int>
-- !query 6 output
7
6


-- !query 7
select extract(minute from c) from t
select extract(hour from c) from t
-- !query 7 schema
struct<minute(CAST(c AS TIMESTAMP)):int>
struct<hour(CAST(c AS TIMESTAMP)):int>
-- !query 7 output
8
7


-- !query 8
select extract(second from c) from t
select extract(minute from c) from t
-- !query 8 schema
struct<second(CAST(c AS TIMESTAMP)):int>
struct<minute(CAST(c AS TIMESTAMP)):int>
-- !query 8 output
8


-- !query 9
select extract(second from c) from t
-- !query 9 schema
struct<second(CAST(c AS TIMESTAMP)):int>
-- !query 9 output
9


-- !query 10
select extract(not_supported from c) from t
-- !query 10 schema
struct<>
-- !query 10 output
org.apache.spark.sql.catalyst.parser.ParseException

Literals of type 'NOT_SUPPORTED' are currently not supported.(line 1, pos 7)

== SQL ==
select extract(not_supported from c) from t
-------^^^