Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/sql-migration-guide-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ license: |

- In Spark version 2.4, when a spark session is created via `cloneSession()`, the newly created spark session inherits its configuration from its parent `SparkContext` even though the same configuration may exist with a different value in its parent spark session. Since Spark 3.0, the configurations of a parent `SparkSession` have a higher precedence over the parent `SparkContext`.

- Since Spark 3.0, parquet logical type `TIMESTAMP_MICROS` is used by default while saving `TIMESTAMP` columns. In Spark version 2.4 and earlier, `TIMESTAMP` columns are saved as `INT96` in parquet files. To set `INT96` to `spark.sql.parquet.outputTimestampType` restores the previous behavior.

## Upgrading from Spark SQL 2.4 to 2.4.1

- The value of `spark.executor.heartbeatInterval`, when specified without units like "30" rather than "30s", was
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,7 @@ object SQLConf {
.stringConf
.transform(_.toUpperCase(Locale.ROOT))
.checkValues(ParquetOutputTimestampType.values.map(_.toString))
.createWithDefault(ParquetOutputTimestampType.INT96.toString)
.createWithDefault(ParquetOutputTimestampType.TIMESTAMP_MICROS.toString)

val PARQUET_INT64_AS_TIMESTAMP_MILLIS = buildConf("spark.sql.parquet.int64AsTimestampMillis")
.doc(s"(Deprecated since Spark 2.3, please set ${PARQUET_OUTPUT_TIMESTAMP_TYPE.key}.) " +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,12 @@ class ParquetInteroperabilitySuite extends ParquetCompatibilityTest with SharedS
).map { s => java.sql.Timestamp.valueOf(s) }
import testImplicits._
// match the column names of the file from impala
val df = spark.createDataset(ts).toDF().repartition(1).withColumnRenamed("value", "ts")
df.write.parquet(tableDir.getAbsolutePath)
withSQLConf(SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key ->
SQLConf.ParquetOutputTimestampType.INT96.toString) {
val df = spark.createDataset(ts).toDF().repartition(1)
.withColumnRenamed("value", "ts")
df.write.parquet(tableDir.getAbsolutePath)
}
FileUtils.copyFile(new File(impalaPath), new File(tableDir, "part-00001.parq"))

Seq(false, true).foreach { int96TimestampConversion =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ class SQLConfSuite extends QueryTest with SharedSQLContext {

// check default value
assert(spark.sessionState.conf.parquetOutputTimestampType ==
SQLConf.ParquetOutputTimestampType.INT96)
SQLConf.ParquetOutputTimestampType.TIMESTAMP_MICROS)

// PARQUET_INT64_AS_TIMESTAMP_MILLIS should be respected.
spark.sessionState.conf.setConf(SQLConf.PARQUET_INT64_AS_TIMESTAMP_MILLIS, true)
Expand Down