-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileFormat based on ORC 1.4.1 #19651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
b342196
ca78ac7
9d18834
6971cdf
b373495
de8b509
726406f
4097457
8e0d392
9e3ac1a
a3ebfbf
cc40fba
f482179
e13dfa3
fdab6a7
daef4ba
f143e17
8a34731
74cb053
eae50b3
520837f
71be008
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,14 +40,19 @@ private[orc] class OrcDeserializer( | |
|
|
||
| private[this] val length = requiredSchema.length | ||
|
|
||
| private[this] val unwrappers = requiredSchema.map(_.dataType).map(unwrapperFor).toArray | ||
| private[this] val unwrappers = requiredSchema.map { f => | ||
| if (missingColumnNames.contains(f.name)) { | ||
| (value: Any, row: InternalRow, ordinal: Int) => row.setNullAt(ordinal) | ||
| } else { | ||
| unwrapperFor(f.dataType) | ||
| } | ||
| }.toArray | ||
|
|
||
| def deserialize(orcStruct: OrcStruct): InternalRow = { | ||
| var i = 0 | ||
| val names = orcStruct.getSchema.getFieldNames | ||
| while (i < length) { | ||
| val name = requiredSchema(i).name | ||
| val writable = if (missingColumnNames.contains(name)) { | ||
| val fieldRefs = requiredSchema.map { f => | ||
| val name = f.name | ||
| if (missingColumnNames.contains(name)) { | ||
| null | ||
| } else { | ||
| if (names.contains(name)) { | ||
|
|
@@ -56,6 +61,11 @@ private[orc] class OrcDeserializer( | |
| orcStruct.getFieldValue("_col" + dataSchema.fieldIndex(name)) | ||
|
||
| } | ||
| } | ||
| }.toArray | ||
|
|
||
| var i = 0 | ||
| while (i < length) { | ||
| val writable = fieldRefs(i) | ||
| if (writable == null) { | ||
| mutableRow.setNullAt(i) | ||
| } else { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this kind of thing in the loop is pretty slow, especially on this critical path. Let's pre-compute it before the loop. Again, PLEASE follow the previous code, which did this check when creating the unwrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW do we really need to handle missing columns for nested fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column-namelogic fromwhile, but it still requiresOrcStructbecause we don't haveStructObjectInspector. So, we cannot move out this fromiter.map.nested fieldshere?