Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/sql-migration-guide-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ displayTitle: Spark SQL Upgrading Guide

- In Spark version 2.4 and earlier, the `current_timestamp` function returns a timestamp with millisecond resolution only. Since Spark 3.0, the function can return the result with microsecond resolution if the underlying clock available on the system offers such resolution.

- In Spark version 2.4 abd earlier, when reading a Hive Serde table with Spark native data sources(parquet/orc), Spark will infer the actual file schema and update the table schema in metastore. Since Spark 3.0, Spark doesn't infer the schema anymore. This should not cause any problems to end users, but if it does, please set `spark.sql.hive.caseSensitiveInferenceMode` to `INFER_AND_SAVE`.

## Upgrading From Spark SQL 2.3 to 2.4

- In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -584,19 +584,17 @@ object SQLConf {
}

val HIVE_CASE_SENSITIVE_INFERENCE = buildConf("spark.sql.hive.caseSensitiveInferenceMode")
.doc("Sets the action to take when a case-sensitive schema cannot be read from a Hive " +
"table's properties. Although Spark SQL itself is not case-sensitive, Hive compatible file " +
"formats such as Parquet are. Spark SQL must use a case-preserving schema when querying " +
"any table backed by files containing case-sensitive field names or queries may not return " +
"accurate results. Valid options include INFER_AND_SAVE (the default mode-- infer the " +
"case-sensitive schema from the underlying data files and write it back to the table " +
"properties), INFER_ONLY (infer the schema but don't attempt to write it to the table " +
"properties) and NEVER_INFER (fallback to using the case-insensitive metastore schema " +
"instead of inferring).")
.internal()
.doc("Sets the action to take when a case-sensitive schema cannot be read from a Hive Serde " +
"table's properties when reading the table with Spark native data sources. Valid options " +
"include INFER_AND_SAVE (infer the case-sensitive schema from the underlying data files " +
"and write it back to the table properties), INFER_ONLY (infer the schema but don't " +
"attempt to write it to the table properties) and NEVER_INFER (the default mode-- fallback " +
"to using the case-insensitive metastore schema instead of inferring).")
.stringConf
.transform(_.toUpperCase(Locale.ROOT))
.checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
.createWithDefault(HiveCaseSensitiveInferenceMode.INFER_AND_SAVE.toString)
.createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/spark/pull/24041/files#diff-9a6b543db706f1a90f790783d6930a13R592 nit for the doc, the INFER_AND_SAVE is no longer the default. Should we mention this in migrate guide?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the comments; 1) update the config description at line 592 ~ 593 and 2) sql-migration-guide-upgrade.md


val OPTIMIZER_METADATA_ONLY = buildConf("spark.sql.optimizer.metadataOnly")
.internal()
Expand Down