Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions docs/sql-ref-ansi-compliance.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,21 @@ license: |
limitations under the License.
---

Spark SQL has two options to comply with the SQL standard: `spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy` (See a table below for details).
Since Spark 3.0, Spark SQL introduces two experimental options to comply with the SQL standard: `spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy` (See a table below for details).

When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard in basic behaviours (e.g., arithmetic operations, type conversion, and SQL parsing).
Moreover, Spark SQL has an independent option to control implicit casting behaviours when inserting rows in a table.
The casting behaviours are defined as store assignment rules in the standard.
When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI store assignment rules.

When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI store assignment rules. This is a separate configuration because its default value is `ANSI`, while the configuration `spark.sql.ansi.enabled` is disabled by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate -> separated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate can be an adjective


<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.sql.ansi.enabled</code></td>
<td>false</td>
<td>
When true, Spark tries to conform to the ANSI SQL specification:
(Experimental) When true, Spark tries to conform to the ANSI SQL specification:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Experimental mean? That is the same with the @Experimental annotation? Anyway, this statement is just copied from one in the SQL configuration document (#27459). So, instead of adding the prefix (Experimental) manually, I personally think its better to add this prefix automatically via sql/gen-sql-config-docs.py. For example, how about adding an experimental method in ConfigBuilder;

  val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled")
    .experimental()
    .doc("When true, Spark tries to conform to the ANSI SQL specification: 1. Spark will " +
      "throw a runtime exception if an overflow occurs in any operation on integral/decimal " +
      "field. 2. Spark will forbid using the reserved keywords of ANSI SQL as identifiers in " +
      "the SQL parser.")
    .booleanConf
    .createWithDefault(false)

Then, the script adds (Experimental) in the head of its description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu I was following this one https://spark.apache.org/docs/latest/configuration.html
I think it's a good idea to add a new method in ConfigBuilder, but I prefer to keep it in this way for this PR, as the configuration table here is not generated from the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think the fix itself in this pr looks fine. cc: @dongjoon-hyun @cloud-fan @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this changes make sense to me. If we have more experimental public conf, we should consider doing it in a proper way.

Regarding the ANSI mode, we need to consider the roadmap of Spark 3.x: which are still missing and what kind of behavior changes we plan to add.

1. Spark will throw a runtime exception if an overflow occurs in any operation on integral/decimal field.
2. Spark will forbid using the reserved keywords of ANSI SQL as identifiers in the SQL parser.
</td>
Expand All @@ -40,7 +42,7 @@ When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with
<td><code>spark.sql.storeAssignmentPolicy</code></td>
<td>ANSI</td>
<td>
When inserting a value into a column with different data type, Spark will perform type coercion.
(Experimental) When inserting a value into a column with different data type, Spark will perform type coercion.
Currently, we support 3 policies for the type coercion rules: ANSI, legacy and strict. With ANSI policy,
Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL.
It disallows certain unreasonable type conversions such as converting string to int or double to boolean.
Expand Down