-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40585][SQL] Double-quoted identifiers should be available only in ANSI mode #38147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
41dc5be to
028a804
Compare
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| val ENFORCE_RESERVED_KEYWORDS = buildConf("spark.sql.ansi.enforceReservedKeywords") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note related to this PR, but I move this code next to the ANSI_ENABLED flag for grouping the ANSI SQL related flags.
| -- !query output | ||
| org.apache.spark.sql.AnalysisException | ||
| Table or view not found: not_exist; line 1 pos 14 | ||
| org.apache.spark.sql.catalyst.parser.ParseException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, do we really need to run this SQL gold file in non-ansi mode? We will just see a bunch of parser errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to ensure it won't work when ANSI mode is off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing there might be an extra testing here:
so this test file itself contains both double_quoted_identifiers=false and double_quoted_identifiers=true and now we run it into both ANSI and non-ANSI.
So I think what the unique testing coverage is:
- non-ANSI and
double_quoted_identifiers=false, so double quoted is still a string. - non-ANSI and
double_quoted_identifiers=true, so we see parser exception. - ANSI and
double_quoted_identifiers=true, this feature is on and being tested.
But for ANSI and double_quoted_identifiers=false which seems to be the same as number 2 above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the test set is small. I don't see the downside of testing all the combinations.
Let's consider about reducing the tests when it becomes bigger, which is quite unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's time to refactor it now, as we run it again in the ansi mode tests.
It's not just unnecessary test time, but also brings confusion: why do we test double_quoted_identifiers=false in ansi mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but also brings confusion: why do we test double_quoted_identifiers=false in ansi mode?
Do you mean "in non-ansi mode"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan so we at least need to test the 3 cases below:
- non-ansi mode and double-quoted id enabled
- ansi mode and double-quoted id enabled
- ansi mode and double-quoted id disabled
Any suggestion on the refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a double-quoted-identifiers.sql without duplicate queries. In .../inputs/ansi/, we add 2 files to import the golden file with different configs
// file double-quoted-identifiers-enabled.sql
--SET spark.sql.ansi.doubleQuotedIdentifiers = true
--IMPORT double-quoted-identifiers.sql
// file double-quoted-identifiers-disabled.sql
--SET spark.sql.ansi.doubleQuotedIdentifiers = false
--IMPORT double-quoted-identifiers.sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-ansi mode and double-quoted id enabled
I don't think so. This is a subconfig of ansi mode, and we only need to test the default case in non-ansi mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks for the suggestion. I updated the tests.
|
Merging to master |
What changes were proposed in this pull request?
#38022 introduces an optional feature for supporting double-quoted identifiers. The feature is controlled by a flag
spark.sql.ansi.double_quoted_identifierswhich is independent from the flagspark.sql.ansi.enabled.This is inconsistent with another ANSI SQL feature "Enforce ANSI reserved keywords": https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#sql-keywords-optional-disabled-by-default, which is only available when
spark.sql.ansi.enabledis true.Thus, to make the ANSI flags consistent, I suggest making double-quoted identifiers only available under ANSI SQL mode.
Other than that, this PR renames it from
spark.sql.ansi.double_quoted_identifierstospark.sql.ansi.doubleQuotedIdentifiersWhy are the changes needed?
To make the ANSI SQL related features consistent.
Does this PR introduce any user-facing change?
No, the feature is not released yet.
How was this patch tested?
New SQL test input file under ANSI mode.