Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

In #19980 , we thought anyNullsSet can be simply implemented by numNulls() > 0. This is logically true, but may have performance problems.

OrcColumnVector is an example. It doesn't have the numNulls property, only has a noNulls property. We will lose a lot of performance if we use numNulls() > 0 to check null.

This PR simply revert #19980, with a renaming to call it hasNull. Better name suggestions are welcome, e.g. nullable?

How was this patch tested?

existing test

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Jan 31, 2018

cc @kiszk @ueshin @viirya @gatorsmile

@ueshin
Copy link
Member

ueshin commented Jan 31, 2018

LGTM, and the name hasNull sounds fine to me, too.

@SparkQA
Copy link

SparkQA commented Jan 31, 2018

Test build #86869 has finished for PR 20452 at commit e68002f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jan 31, 2018

Overall LGTM. One question is, at #19980, we replace anyNullsSet with some predicate of numNulls (like numNulls == 0 or numNulls > 0) in many places, shall we use hasNull now for those checks?

@kiszk
Copy link
Member

kiszk commented Jan 31, 2018

LGTM. It is fine with me for hasNull, to.

@SparkQA
Copy link

SparkQA commented Jan 31, 2018

Test build #86874 has finished for PR 20452 at commit 3b965f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks, merging to master/2.3!

asfgit pushed a commit that referenced this pull request Jan 31, 2018
## What changes were proposed in this pull request?

In #19980 , we thought `anyNullsSet` can be simply implemented by `numNulls() > 0`. This is logically true, but may have performance problems.

`OrcColumnVector` is an example. It doesn't have the `numNulls` property, only has a `noNulls` property. We will lose a lot of performance if we use `numNulls() > 0` to check null.

This PR simply revert #19980, with a renaming to call it `hasNull`. Better name suggestions are welcome, e.g. `nullable`?

## How was this patch tested?

existing test

Author: Wenchen Fan <[email protected]>

Closes #20452 from cloud-fan/null.

(cherry picked from commit 48dd6a4)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 48dd6a4 Jan 31, 2018
@dongjoon-hyun
Copy link
Member

Hi, @cloud-fan .
I reopen the JIRA since it's technically reverted. You can resolve that back with a new title like Renaming anyNullsSet to hasNull or something.

@cloud-fan
Copy link
Contributor Author

thanks @dongjoon-hyun !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants