Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Address comment.
  • Loading branch information
dongjoon-hyun committed Feb 15, 2018
commit 276963369b663c99aebcde10f4329f479f43b3ea
7 changes: 6 additions & 1 deletion docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1006,7 +1006,12 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

## ORC Files

Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files. To do that, the following configurations are newly added. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC serde table (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
To do that, the following configurations are newly added. The vectorized reader is used for the
native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya . I split into multiple lines. Could you point out once more?

Copy link
Member

@viirya viirya Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when spark.sql.hive.convertMetastoreOrc is (also) set to true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I see.


<table class="table">
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,11 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
}

override def afterAll(): Unit = {
spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
super.afterAll()
try {
spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
} finally {
super.afterAll()
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is the same.


private val allFileBasedDataSources = Seq("orc", "parquet", "csv", "json", "text")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,11 @@ class FileStreamSinkSuite extends StreamTest {
}

override def afterAll(): Unit = {
spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
super.afterAll()
try {
spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
} finally {
super.afterAll()
}
}

test("unpartitioned writing and batch reading") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -213,8 +213,11 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
}

override def afterAll(): Unit = {
spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
super.afterAll()
try {
spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
} finally {
super.afterAll()
}
}

// ============= Basic parameter exists tests ================
Expand Down