Skip to content

Conversation

@gszadovszky
Copy link
Contributor

Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics:

  • Ignoring statistics in case of it contains a NaN value.
  • Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics.

Float max = stats.genericGetMax();
// Drop min/max values in case of NaN as the sorting order of values is undefined for this case
if (min.isNaN() || max.isNaN()) {
stats.setMinMax(0.0f, 0.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary. On the other hand, this is the default value when it's unset, so I'm fine with leaving this.

((Statistics<?>) stats).hasNonNullValue = false;
} else {
// Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values would be skipped
if (min == 0.0f) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+0 and -0 are equal according to ==. Please use Float.compareTo() instead.

// Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values would be skipped
if (min == 0.0f) {
stats.setMinMax(-0.0f, max);
min = -0.0f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) In my opinion,

min = -0.0f;
stats.setMinMax(min, max);

would be cleaner. Similar for the max case (although max is not used any more in this function).

}

@Override
public Statistics<?> build() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for the float case.

assertEquals(max, stats.genericGetMax());
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case for -0 and +0.

Copy link
Contributor

@zivanfi zivanfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. I'll merge this on Monday if nobody else requests any more changes.

@zivanfi zivanfi closed this in 0a86429 Mar 19, 2018
legend-hua pushed a commit to legend-hua/parquet-mr that referenced this pull request Mar 23, 2018
Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics:
- Ignoring statistics in case of it contains a NaN value.
- Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics.

Author: Gabor Szadovszky <[email protected]>

Closes apache#461 from gszadovszky/PARQUET-1246 and squashes the following commits:

20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments
3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN
gszadovszky pushed a commit to gszadovszky/parquet-mr that referenced this pull request Apr 24, 2018
Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics:
- Ignoring statistics in case of it contains a NaN value.
- Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics.

Author: Gabor Szadovszky <[email protected]>

Closes apache#461 from gszadovszky/PARQUET-1246 and squashes the following commits:

20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments
3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN

This change is based on 0a86429 but is not a clean cherry-pick.
zivanfi pushed a commit that referenced this pull request Apr 25, 2018
Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics:
- Ignoring statistics in case of it contains a NaN value.
- Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics.

Author: Gabor Szadovszky <[email protected]>

Closes #461 from gszadovszky/PARQUET-1246 and squashes the following commits:

20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments
3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN

This change is based on 0a86429 but is not a clean cherry-pick.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants