Skip to content

Conversation

@rajeshbalamohan
Copy link

What changes were proposed in this pull request?

Physical files stored in Hive as ORC would have internal columns as _col1,_col2 etc and column mapping would be available in HiveMetastore. It was possible to query ORC tables stored in Hive via Spark's beeline client in earlier branches, and with master branch this was broken. When reading ORC files, it would be good map hive schema to physical schema for supporting backward compatibility. This PR addresses this issue.

How was this patch tested?

Manual execution of TPC-DS queries at 200 GB scale.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…in Hive

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rajeshbalamohan rajeshbalamohan changed the title SPARK-14387. [SQL] Exceptions thrown when querying ORC tables stored … [SPARK-14387][SQL] Exceptions thrown when querying ORC tables in Hive Apr 22, 2016
@rajeshbalamohan
Copy link
Author

Changes:

  • Rebased patch to master branch
  • Removed OrcTableScan as it is not used anywhere.

@rajeshbalamohan
Copy link
Author

\cc @liancheng

@rajeshbalamohan
Copy link
Author

\cc @liancheng , @rxin - Can you please review when you find time?

@yuananf
Copy link

yuananf commented Jul 22, 2016

I tested this patch, and it can fix the bug.
@rajeshbalamohan seems this need to rebase.
When will this get merged? we really need this, or we will face a lot of compatibility issues.

@rajeshbalamohan
Copy link
Author

@yuananf Thanks for trying it out. I have rebased it and created #14471. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants