Skip to content

Conversation

@allisonwang-db
Copy link
Contributor

@allisonwang-db allisonwang-db commented May 30, 2024

What changes were proposed in this pull request?

This PR adds support for creating user-defined SQL functions in parser. Here is the SQL syntax:

CREATE [OR REPLACE] [TEMPORARY] FUNCTION [IF NOT EXISTS] [db_name.]function_name
([param_name param_type [COMMENT param_comment], ...])
RETURNS {ret_type | TABLE (ret_name ret_type [COMMENT ret_comment], ...])}
[routine_characteristic] 
RETURN {expression | query };

routine_characteristic
  { LANGUAGE {SQL | IDENTIFIER} |
    [NOT] DETERMINISTIC |
    COMMENT function_comment |
    [CONTAINS SQL | READS SQL DATA] }

Why are the changes needed?

To support SQL user-defined functions.

Does this PR introduce any user-facing change?

Yes. This PR adds parser support for creating user-defined SQL functions.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label May 30, 2024
@allisonwang-db allisonwang-db changed the title [SPARK-48479][SQL] Support creating SQL functions in parser [SPARK-48479][SQL] Support creating scalar and table SQL UDFs in parser Jun 17, 2024
@github-actions github-actions bot added the DOCS label Jun 17, 2024
@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan @dtenedor @srielau

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like we can remove createOrReplaceTableColType as it's completely the same as this new colDefinition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will create a follow up PR for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add an error class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an internal error since we blocked languages other than SQL in the parser.

@cloud-fan
Copy link
Contributor

there are legitimate test failures: SPARK-43119: Get SQL Keywords

@cloud-fan
Copy link
Contributor

@allisonwang-db there are merge conflicts now...

@allisonwang-db allisonwang-db force-pushed the spark-48479-sql-udf-parser branch from c0e198c to fac0c87 Compare June 19, 2024 18:26
@cloud-fan
Copy link
Contributor

the pyspark failure is unrelated, thanks, merging to master!

@cloud-fan cloud-fan closed this in 0d9f8a1 Jun 20, 2024
@yaooqinn
Copy link
Member

Can we explain why the PR description isn't consistent with the implementation? Clauses like LANGUAGE-, SECURITY- are introduced by accident?

* ([param_name param_type [COMMENT param_comment], ...])
* RETURNS {ret_type | TABLE (ret_name ret_type [COMMENT ret_comment], ...])}
* [routine_characteristics]
* RETURN {expression | TABLE ( query )};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR description is copied from here, but we need to update here to match the full syntax. @allisonwang-db

Copy link
Member

@yaooqinn yaooqinn Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SECURITY- clause and its KEYWORDS do not necessarily have to be introduced based on the privilege control of other objects like tables, columns, and other host language UDFs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and Spark fails if the SECURITY clause is specified: https://github.com/apache/spark/pull/46816/files#diff-77a9aad2da3dc60210a2c4d2f3165d5f1d0acd54ca4811072a053225170ed748R808

It's just for better error message: instead of antlr errors, we give a clear error message for unsupported features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having better error messages on unsupported features sounds reasonable to me. However, we seems to ban a MYSQL feature instead of the ANSI one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe higher ANSI standards support these, but I don't have a copy to check:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @srielau for ANSI SQL syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've updated the description for this PR and will create a follow up PR to fix it in code.

cloud-fan pushed a commit that referenced this pull request Jun 21, 2024
…and colDefinitionType in parser

### What changes were proposed in this pull request?

This PR is a follow-up for #46816 to address this comment: #46816 (comment) to consolidate `createOrReplaceTableColType` and `colDefinitionType` since they are exactly the same.

### Why are the changes needed?

To make the code cleaner

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47047 from allisonwang-db/spark-48479-sql-udf-parser-follow-up.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants