-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35898][SQL] Fix arrays and maps in RowToColumnConverter #33108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan |
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine otherwise.
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
Show resolved
Hide resolved
| case StringType => StringConverter | ||
| case CalendarIntervalType => CalendarConverter | ||
| case at: ArrayType => new ArrayConverter(getConverterForType(at.elementType, nullable)) | ||
| case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, at.containsNull)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Nice catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be nullable || at.containsNull?
| assert(testVector.getArray(3).toIntArray() === Array(3, 4, 5)) | ||
| } | ||
|
|
||
| testVectors("array append", 1, arrayType) { testVector => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug fix, so could you add the prefix: SPARK-35898: in the test names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add them! I did not add before them since they test general usage patterns instead of a specific edge case.
sql/core/src/test/scala/org/apache/spark/sql/execution/RowToColumnConverterSuite.scala
Outdated
Show resolved
Hide resolved
|
@maropu Thank you for the review! I addressed your feedback. PTAL :) |
revans2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up after me.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merging to master/3.1 |
### What changes were proposed in this pull request? This PR fixes support for arrays and maps in `RowToColumnConverter`. In particular this PR fixes two bugs: 1. `appendArray` in `WritableColumnVector` does not reserve any elements in its child arrays, which causes the assertion in `OffHeapColumnVector.putArray` to fail. 2. The nullability of the child columns is propagated incorrectly when creating the child converters of `ArrayConverter` and `MapConverter` in `RowToColumnConverter`. This PR fixes these issues. ### Why are the changes needed? Both bugs cause an exception to be thrown. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I added additional test cases to `ColumnVectorSuite` to catch the first bug, and I added `RowToColumnConverterSuite` to catch the both bugs (but specifically the second). Closes #33108 from tomvanbussel/SPARK-35898. Authored-by: Tom van Bussel <[email protected]> Signed-off-by: herman <[email protected]> (cherry picked from commit c660650) Signed-off-by: herman <[email protected]>
| case mt: MapType => new MapConverter(getConverterForType(mt.keyType, nullable), | ||
| getConverterForType(mt.valueType, nullable)) | ||
| case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = false), | ||
| getConverterForType(mt.valueType, mt.valueContainsNull)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, nullable || mt.valueContainsNull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the map itself is null, I think we won't invoke the converter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense.
### What changes were proposed in this pull request? This PR fixes support for arrays and maps in `RowToColumnConverter`. In particular this PR fixes two bugs: 1. `appendArray` in `WritableColumnVector` does not reserve any elements in its child arrays, which causes the assertion in `OffHeapColumnVector.putArray` to fail. 2. The nullability of the child columns is propagated incorrectly when creating the child converters of `ArrayConverter` and `MapConverter` in `RowToColumnConverter`. This PR fixes these issues. ### Why are the changes needed? Both bugs cause an exception to be thrown. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I added additional test cases to `ColumnVectorSuite` to catch the first bug, and I added `RowToColumnConverterSuite` to catch the both bugs (but specifically the second). Closes apache#33108 from tomvanbussel/SPARK-35898. Authored-by: Tom van Bussel <[email protected]> Signed-off-by: herman <[email protected]> (cherry picked from commit c660650) Signed-off-by: herman <[email protected]> (cherry picked from commit fe412b6)
### What changes were proposed in this pull request? This PR fixes support for arrays and maps in `RowToColumnConverter`. In particular this PR fixes two bugs: 1. `appendArray` in `WritableColumnVector` does not reserve any elements in its child arrays, which causes the assertion in `OffHeapColumnVector.putArray` to fail. 2. The nullability of the child columns is propagated incorrectly when creating the child converters of `ArrayConverter` and `MapConverter` in `RowToColumnConverter`. This PR fixes these issues. ### Why are the changes needed? Both bugs cause an exception to be thrown. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I added additional test cases to `ColumnVectorSuite` to catch the first bug, and I added `RowToColumnConverterSuite` to catch the both bugs (but specifically the second). Closes apache#33108 from tomvanbussel/SPARK-35898. Authored-by: Tom van Bussel <[email protected]> Signed-off-by: herman <[email protected]> (cherry picked from commit c660650) Signed-off-by: herman <[email protected]>
What changes were proposed in this pull request?
This PR fixes support for arrays and maps in
RowToColumnConverter. In particular this PR fixes two bugs:appendArrayinWritableColumnVectordoes not reserve any elements in its child arrays, which causes the assertion inOffHeapColumnVector.putArrayto fail.ArrayConverterandMapConverterinRowToColumnConverter.This PR fixes these issues.
Why are the changes needed?
Both bugs cause an exception to be thrown.
Does this PR introduce any user-facing change?
No
How was this patch tested?
I added additional test cases to
ColumnVectorSuiteto catch the first bug, and I addedRowToColumnConverterSuiteto catch the both bugs (but specifically the second).