Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

#38022 introduces an optional feature for supporting double-quoted identifiers. The feature is controlled by a flag spark.sql.ansi.double_quoted_identifiers which is independent from the flag spark.sql.ansi.enabled.
This is inconsistent with another ANSI SQL feature "Enforce ANSI reserved keywords": https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#sql-keywords-optional-disabled-by-default, which is only available when spark.sql.ansi.enabled is true.

Thus, to make the ANSI flags consistent, I suggest making double-quoted identifiers only available under ANSI SQL mode.
Other than that, this PR renames it from spark.sql.ansi.double_quoted_identifiers to spark.sql.ansi.doubleQuotedIdentifiers

Why are the changes needed?

To make the ANSI SQL related features consistent.

Does this PR introduce any user-facing change?

No, the feature is not released yet.

How was this patch tested?

New SQL test input file under ANSI mode.

.booleanConf
.createWithDefault(false)

val ENFORCE_RESERVED_KEYWORDS = buildConf("spark.sql.ansi.enforceReservedKeywords")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note related to this PR, but I move this code next to the ANSI_ENABLED flag for grouping the ANSI SQL related flags.

@gengliangwang
Copy link
Member Author

cc @srielau @entong as well

-- !query output
org.apache.spark.sql.AnalysisException
Table or view not found: not_exist; line 1 pos 14
org.apache.spark.sql.catalyst.parser.ParseException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we really need to run this SQL gold file in non-ansi mode? We will just see a bunch of parser errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to ensure it won't work when ANSI mode is off.

Copy link
Contributor

@amaliujia amaliujia Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing there might be an extra testing here:

so this test file itself contains both double_quoted_identifiers=false and double_quoted_identifiers=true and now we run it into both ANSI and non-ANSI.

So I think what the unique testing coverage is:

  1. non-ANSI and double_quoted_identifiers=false, so double quoted is still a string.
  2. non-ANSI and double_quoted_identifiers=true, so we see parser exception.
  3. ANSI and double_quoted_identifiers=true, this feature is on and being tested.

But for ANSI and double_quoted_identifiers=false which seems to be the same as number 2 above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the test set is small. I don't see the downside of testing all the combinations.
Let's consider about reducing the tests when it becomes bigger, which is quite unlikely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's time to refactor it now, as we run it again in the ansi mode tests.

It's not just unnecessary test time, but also brings confusion: why do we test double_quoted_identifiers=false in ansi mode?

Copy link
Member Author

@gengliangwang gengliangwang Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but also brings confusion: why do we test double_quoted_identifiers=false in ansi mode?

Do you mean "in non-ansi mode"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan so we at least need to test the 3 cases below:

  • non-ansi mode and double-quoted id enabled
  • ansi mode and double-quoted id enabled
  • ansi mode and double-quoted id disabled

Any suggestion on the refactor?

Copy link
Contributor

@cloud-fan cloud-fan Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a double-quoted-identifiers.sql without duplicate queries. In .../inputs/ansi/, we add 2 files to import the golden file with different configs

// file double-quoted-identifiers-enabled.sql
--SET spark.sql.ansi.doubleQuotedIdentifiers = true
--IMPORT double-quoted-identifiers.sql
// file double-quoted-identifiers-disabled.sql
--SET spark.sql.ansi.doubleQuotedIdentifiers = false
--IMPORT double-quoted-identifiers.sql

Copy link
Contributor

@cloud-fan cloud-fan Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-ansi mode and double-quoted id enabled

I don't think so. This is a subconfig of ansi mode, and we only need to test the default case in non-ansi mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Thanks for the suggestion. I updated the tests.

@gengliangwang
Copy link
Member Author

Merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants