Skip to content

Conversation

@tomvanbussel
Copy link
Contributor

@tomvanbussel tomvanbussel commented Jun 27, 2021

What changes were proposed in this pull request?

This PR fixes support for arrays and maps in RowToColumnConverter. In particular this PR fixes two bugs:

  1. appendArray in WritableColumnVector does not reserve any elements in its child arrays, which causes the assertion in OffHeapColumnVector.putArray to fail.
  2. The nullability of the child columns is propagated incorrectly when creating the child converters of ArrayConverter and MapConverter in RowToColumnConverter.

This PR fixes these issues.

Why are the changes needed?

Both bugs cause an exception to be thrown.

Does this PR introduce any user-facing change?

No

How was this patch tested?

I added additional test cases to ColumnVectorSuite to catch the first bug, and I added RowToColumnConverterSuite to catch the both bugs (but specifically the second).

@github-actions github-actions bot added the SQL label Jun 27, 2021
@tomvanbussel
Copy link
Contributor Author

cc @cloud-fan

@cloud-fan
Copy link
Contributor

@viirya @maropu @revans2

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine otherwise.

case StringType => StringConverter
case CalendarIntervalType => CalendarConverter
case at: ArrayType => new ArrayConverter(getConverterForType(at.elementType, nullable))
case at: ArrayType => ArrayConverter(getConverterForType(at.elementType, at.containsNull))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Nice catch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be nullable || at.containsNull?

assert(testVector.getArray(3).toIntArray() === Array(3, 4, 5))
}

testVectors("array append", 1, arrayType) { testVector =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug fix, so could you add the prefix: SPARK-35898: in the test names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add them! I did not add before them since they test general usage patterns instead of a specific edge case.

@tomvanbussel
Copy link
Contributor Author

@maropu Thank you for the review! I addressed your feedback. PTAL :)

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up after me.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

Merging to master/3.1

hvanhovell pushed a commit that referenced this pull request Jun 28, 2021
### What changes were proposed in this pull request?

This PR fixes support for arrays and maps in `RowToColumnConverter`. In particular this PR fixes two bugs:

1. `appendArray` in `WritableColumnVector` does not reserve any elements in its child arrays, which causes the assertion in `OffHeapColumnVector.putArray` to fail.
2. The nullability of the child columns is propagated incorrectly when creating the child converters of `ArrayConverter` and `MapConverter` in `RowToColumnConverter`.

This PR fixes these issues.

### Why are the changes needed?

Both bugs cause an exception to be thrown.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

I added additional test cases to `ColumnVectorSuite` to catch the first bug, and I added `RowToColumnConverterSuite` to catch the both bugs (but specifically the second).

Closes #33108 from tomvanbussel/SPARK-35898.

Authored-by: Tom van Bussel <[email protected]>
Signed-off-by: herman <[email protected]>
(cherry picked from commit c660650)
Signed-off-by: herman <[email protected]>
case mt: MapType => new MapConverter(getConverterForType(mt.keyType, nullable),
getConverterForType(mt.valueType, nullable))
case mt: MapType => MapConverter(getConverterForType(mt.keyType, nullable = false),
getConverterForType(mt.valueType, mt.valueContainsNull))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, nullable || mt.valueContainsNull?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the map itself is null, I think we won't invoke the converter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?

This PR fixes support for arrays and maps in `RowToColumnConverter`. In particular this PR fixes two bugs:

1. `appendArray` in `WritableColumnVector` does not reserve any elements in its child arrays, which causes the assertion in `OffHeapColumnVector.putArray` to fail.
2. The nullability of the child columns is propagated incorrectly when creating the child converters of `ArrayConverter` and `MapConverter` in `RowToColumnConverter`.

This PR fixes these issues.

### Why are the changes needed?

Both bugs cause an exception to be thrown.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

I added additional test cases to `ColumnVectorSuite` to catch the first bug, and I added `RowToColumnConverterSuite` to catch the both bugs (but specifically the second).

Closes apache#33108 from tomvanbussel/SPARK-35898.

Authored-by: Tom van Bussel <[email protected]>
Signed-off-by: herman <[email protected]>
(cherry picked from commit c660650)
Signed-off-by: herman <[email protected]>
(cherry picked from commit fe412b6)
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
### What changes were proposed in this pull request?

This PR fixes support for arrays and maps in `RowToColumnConverter`. In particular this PR fixes two bugs:

1. `appendArray` in `WritableColumnVector` does not reserve any elements in its child arrays, which causes the assertion in `OffHeapColumnVector.putArray` to fail.
2. The nullability of the child columns is propagated incorrectly when creating the child converters of `ArrayConverter` and `MapConverter` in `RowToColumnConverter`.

This PR fixes these issues.

### Why are the changes needed?

Both bugs cause an exception to be thrown.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

I added additional test cases to `ColumnVectorSuite` to catch the first bug, and I added `RowToColumnConverterSuite` to catch the both bugs (but specifically the second).

Closes apache#33108 from tomvanbussel/SPARK-35898.

Authored-by: Tom van Bussel <[email protected]>
Signed-off-by: herman <[email protected]>
(cherry picked from commit c660650)
Signed-off-by: herman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants