Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Use 'hive' for ORC
  • Loading branch information
dongjoon-hyun committed Feb 14, 2018
commit 2d74b204b85db1ffcfb164a160e8f6f0d02d3f4b
2 changes: 1 addition & 1 deletion docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1784,7 +1784,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
<tr>
<td><code>spark.sql.orc.impl</code></td>
<td><code>native</code></td>
<td><code>hive</code></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need this in the migration guide. Please create a new section for ORC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the impl was changed back to the old implementation? this breaks spark.read.orc

<td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td>
</tr>
<tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -399,11 +399,11 @@ object SQLConf {

val ORC_IMPLEMENTATION = buildConf("spark.sql.orc.impl")
.doc("When native, use the native version of ORC support instead of the ORC library in Hive " +
"1.2.1. It is 'hive' by default prior to Spark 2.3.")
"1.2.1. It is 'hive' by default.")
.internal()
.stringConf
.checkValues(Set("hive", "native"))
.createWithDefault("native")
.createWithDefault("hive")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to disable the ORC pushdown, because the ORC reader of Hive 1.2.1 has a few bugs.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we don't have a test case for that, do we? Actually, I want to have a test case for that.


val ORC_VECTORIZED_READER_ENABLED = buildConf("spark.sql.orc.enableVectorizedReader")
.doc("Enables vectorized orc decoding.")
Expand Down