Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/img/type-precedence-list.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 76 additions & 1 deletion docs/sql-ref-ansi-compliance.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ SELECT abs(-2147483648);
+----------------+
```

### Type Conversion
### Cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section, it says In future releases, the behaviour of type coercion might change along with the other two type conversion rules.

We should update it.


Spark SQL has three kinds of type conversions: explicit casting, type coercion, and store assignment casting.
When `spark.sql.ansi.enabled` is set to `true`, explicit casting by `CAST` syntax throws a runtime exception for illegal cast patterns defined in the standard, e.g. casts from a string to an integer.
Expand Down Expand Up @@ -160,6 +160,81 @@ SELECT * FROM t;
+---+
```

### Type coercion
#### Type Promotion and Precedence
When `spark.sql.ansi.enabled` is set to `true`, Spark SQL uses several rules that govern how conflicts between data types are resolved.
At the heart of this conflict resolution is the Type Precedence List which defines whether values of a given data type can be promoted to another data type implicitly.

| Data type | precedence list(from narrowest to widest) |
|-----------|------------------------------------------------------------------|
| Byte | Byte -> Short -> Int -> Long -> Decimal -> Float* -> Double |
| Short | Short -> Int -> Long -> Decimal-> Float* -> Double |
| Int | Int -> Long -> Decimal -> Float* -> Double |
| Long | Long -> Decimal -> Float* -> Double |
| Decimal | Decimal -> Float* -> Double |
| Float | Float -> Double |
| Double | Double |
| Date | Date -> Timestamp |
| Timestamp | Timestamp |
| String | String |
| Binary | Binary |
| Boolean | Boolean |
| Interval | Interval |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we distinguish year-month and day-time interval types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple in this section

| Map | Map** |
| Array | Array** |
| Struct | Struct** |

\* For least common type resolution float is skipped to avoid loss of precision.

\*\* For a complex type, the precedence rule applies recursively to its component elements.

Special rules apply for string literals and untyped NULL.
A NULL can be promoted to any other type, while a string literal can be promoted to any simple data type.

This is a graphical depiction of the precedence list as a directed tree:
<img src="img/type-precedence-list.png" width="80%" title="Type Precedence List" alt="Type Precedence List">

#### Least Common Type Resolution
The least common type from a set of types is the narrowest type reachable from the precedence list by all elements of the set of types.

The least common type resolution is used to:
- Decide whether a function expecting a parameter of a type can be invoked using an argument of a narrower type.
- Derive the argument type for functions which expect a shared argument type for multiple parameters, such as coalesce, least, or greatest.
- Derive the operand types for operators such as arithmetic operations or comparisons.
- Derive the result type for expressions such as the case expression.
- Derive the element, key, or value types for array and map constructors.
Special rules are applied if the least common type resolves to FLOAT. If any of the types is INT, BIGINT, or DECIMAL the least common type is pushed to DOUBLE to avoid potential loss of digits.

#### Examples
The coalesce function accepts any set of argument types as long as they share a least common type.
The result type is the least common type of the arguments.
```sql
> SET spark.sql.ansi.enabled=true;
> SELECT typeof(coalesce(1Y, 1L, NULL));
BIGINT
> SELECT typeof(coalesce(1, DATE'2020-01-01'));
Error: Incompatible types [INT, DATE]

> SELECT typeof(coalesce(ARRAY(1Y), ARRAY(1L)));
ARRAY<BIGINT>
> SELECT typeof(coalesce(1, 1F));
DOUBLE
> SELECT typeof(coalesce(1L, 1F));
DOUBLE
> SELECT (typeof(coalesce(1BD, 1F)));
DOUBLE

-- The substring function expects arguments of type INT for the start and length parameters.
> SELECT substring('hello', 1, 2);
he
> SELECT substring('hello', '1', 2);
he
> SELECT substring('hello', 1L, 2);
Error: Argument 2 requires an INT type.
> SELECT substring('hello', str, 2) FROM VALUES(CAST('1' AS STRING)) AS T(str);
Error: Argument 2 requires an INT type.
```

### SQL Functions

The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`).
Expand Down