-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34249][DOCS] Add documentation for ANSI implicit cast rules #33516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
init doc for type coercion
- Loading branch information
commit cac633772226ebaf3acdc39329308175ed9ec4f3
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -160,6 +160,81 @@ SELECT * FROM t; | |
| +---+ | ||
| ``` | ||
|
|
||
| ### Type coercion | ||
| #### Type Promotion and Precedence | ||
| Spark SQL uses several rules that govern how conflicts between data types are resolved. | ||
| At the heart of this conflict resolution is the Type Precedence List which defines whether values of a given data type can be promoted to another data type implicitly. | ||
|
|
||
| | Data type | precedence list(from narrowest to widest) | | ||
| |-----------|------------------------------------------------------------------| | ||
| | Byte | Byte -> Short -> Int -> Long -> Decimal -> Float* -> Double | | ||
| | Short | Short -> Int -> Long -> Decimal-> Float* -> Double | | ||
| | Int | Int -> Long -> Decimal -> Float* -> Double | | ||
| | Long | Long -> Decimal -> Float* -> Double | | ||
| | Decimal | Decimal -> Float* -> Double | | ||
| | Float | Float -> Double | | ||
| | Double | Double | | ||
| | Date | Date-> Timestamp | | ||
| | Timestamp | Timestamp | | ||
| | String | String | | ||
| | Binary | Binary | | ||
| | Boolean | Boolean | | ||
| | Interval | Interval | | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we distinguish year-month and day-time interval types?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's keep it simple in this section |
||
| | Map | Map** | | ||
| | Array | Array** | | ||
| | Struct | Struct** | | ||
|
|
||
| \* For least common type resolution float is skipped to avoid loss of precision. | ||
|
|
||
| \*\* For a complex type, the precedence rule applies recursively to its component elements. | ||
|
|
||
| Special rules apply for string literals and untyped NULL. | ||
| A NULL can be promoted to any other type, while a string literal can be promoted to any simple data type. | ||
|
|
||
| This is a graphical depiction of the precedence list as a directed tree: | ||
| <img src="img/type-precedence-list.png" width="80%" title="Type Precedence List" alt="Type Precedence List"> | ||
|
|
||
| #### Least Common Type Resolution | ||
| The least common type from a set of types is the narrowest type reachable from the precedence list by all elements of the set of types. | ||
|
|
||
| The least common type resolution is used to: | ||
| - Decide whether a function expecting a parameter of a type can be invoked using an argument of a narrower type. | ||
| - Derive the argument type for functions which expect a shared argument type for multiple parameters, such as coalesce, least, or greatest. | ||
gengliangwang marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Derive the operand types for operators such as arithmetic operations or comparisons. | ||
| - Derive the result type for expressions such as the case expression. | ||
| - Derive the element, key, or value types for array and map constructors. | ||
| Special rules are applied if the least common type resolves to FLOAT. If any of the types is INT, BIGINT, or DECIMAL the least common type is pushed to DOUBLE to avoid potential loss of digits. | ||
|
|
||
| #### Examples | ||
| The coalesce function accepts any set of argument types as long as they share a least common type. | ||
| The result type is the least common type of the arguments. | ||
| ```sql | ||
| > SELECT typeof(coalesce(1Y, 1L, NULL)); | ||
| BIGINT | ||
| > SELECT typeof(coalesce(1, DATE’2020-01-01’)); | ||
| Error: Incompatible types [INT, DATE] | ||
|
|
||
| > SELECT typeof(coalesce(ARRAY(1Y), ARRAY(1L))) | ||
| ARRAY<BIGINT> | ||
| > SELECT typeof(coalesce(1, 1F)) | ||
| DOUBLE | ||
| > SELECT typeof(coalesce(1L, 1F)) | ||
| DOUBLE | ||
| > SELECT (typeof(coalesce(1BD, 1F)) | ||
| DOUBLE | ||
|
|
||
| -- The substring function expects arguments of type INT for the start and length parameters. | ||
| > SELECT substring(‘hello’, 1, 2); | ||
| He | ||
| > SELECT substring(‘hello’, ‘1’, 2); | ||
| he | ||
| > SELECT substring(‘hello’, 1L, 2); | ||
| Error: Argument 2 requires an INT type. | ||
| > SELECT substring(‘hello’, str, 2) | ||
| FROM VALUES(CAST(‘1’ AS TRING)) AS T(str); | ||
| Error: Argument 2 requires an INT type. | ||
| ``` | ||
|
|
||
| ### SQL Functions | ||
|
|
||
| The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date-> Timestamp->Date -> TimestampThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done