-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1246: Ignore float/double statistics in case of NaN #461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Float max = stats.genericGetMax(); | ||
| // Drop min/max values in case of NaN as the sorting order of values is undefined for this case | ||
| if (min.isNaN() || max.isNaN()) { | ||
| stats.setMinMax(0.0f, 0.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary. On the other hand, this is the default value when it's unset, so I'm fine with leaving this.
| ((Statistics<?>) stats).hasNonNullValue = false; | ||
| } else { | ||
| // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values would be skipped | ||
| if (min == 0.0f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+0 and -0 are equal according to ==. Please use Float.compareTo() instead.
| // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values would be skipped | ||
| if (min == 0.0f) { | ||
| stats.setMinMax(-0.0f, max); | ||
| min = -0.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) In my opinion,
min = -0.0f;
stats.setMinMax(min, max);
would be cleaner. Similar for the max case (although max is not used any more in this function).
| } | ||
|
|
||
| @Override | ||
| public Statistics<?> build() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as for the float case.
| assertEquals(max, stats.genericGetMax()); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test case for -0 and +0.
zivanfi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. I'll merge this on Monday if nobody else requests any more changes.
Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics: - Ignoring statistics in case of it contains a NaN value. - Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics. Author: Gabor Szadovszky <[email protected]> Closes apache#461 from gszadovszky/PARQUET-1246 and squashes the following commits: 20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments 3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN
Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics: - Ignoring statistics in case of it contains a NaN value. - Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics. Author: Gabor Szadovszky <[email protected]> Closes apache#461 from gszadovszky/PARQUET-1246 and squashes the following commits: 20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments 3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN This change is based on 0a86429 but is not a clean cherry-pick.
Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics: - Ignoring statistics in case of it contains a NaN value. - Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics. Author: Gabor Szadovszky <[email protected]> Closes #461 from gszadovszky/PARQUET-1246 and squashes the following commits: 20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments 3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN This change is based on 0a86429 but is not a clean cherry-pick.
Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics: