Skip to content

Conversation

@marin-ma
Copy link
Contributor

What changes were proposed in this pull request?

Reset the WritableColumnVector when getting "next" ColumnarBatch in RowToColumnarExec

Why are the changes needed?

When converting Iterator[InternalRow] to Iterator[ColumnarBatch], the vectors used to create a new ColumnarBatch should be reset in the iterator's "next()" method.

Does this PR introduce any user-facing change?

No

How was this patch tested?

N/A

@carsonwang
Copy link
Contributor

cc @cloud-fan @revans2

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

do we have a unit test?

@revans2
Copy link
Contributor

revans2 commented Oct 16, 2019

Good catch, yes they need to be reset when reused.

@marin-ma
Copy link
Contributor Author

do we have a unit test?

Not yet. I can add one if needed.
The default number of rows in one ColumnarBatch is set to 10000 in SQLConf. So this bug is only triggered when the number of rows larger than 10000.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112161 has finished for PR 26137 at commit 7b9036b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 17, 2019

Test build #112200 has finished for PR 26137 at commit 97631bd.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 18, 2019

Test build #112245 has finished for PR 26137 at commit 114a743.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marin-ma
Copy link
Contributor Author

@cloud-fan Could you help to review this? I've added one unit test.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29490][SQL]Reset 'WritableColumnVector' in 'RowToColumnarExec' [SPARK-29490][SQL] Reset 'WritableColumnVector' in 'RowToColumnarExec' Oct 24, 2019
@dongjoon-hyun
Copy link
Member

Welcome to the Apache Spark community, @rongma1997 ! I left a few comments.

@marin-ma marin-ma closed this Oct 25, 2019
@marin-ma marin-ma reopened this Oct 25, 2019
@dongjoon-hyun
Copy link
Member

cc @gatorsmile

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you all.

@dongjoon-hyun
Copy link
Member

You are added to the Apache Spark contributor group, @rongma1997 .

@marin-ma marin-ma deleted the reset-WritableColumnVector branch October 28, 2019 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants