-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Variant] Add variant_get and Shredded VariantArray
#8021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
3ba9fe0
[Variant] Strawman / infrastructure for variant_get of shredded values
alamb f458ec5
Apply suggestions from code review
alamb 52cf52b
Merge remote-tracking branch 'apache/main' into alamb/variant_shreddi…
alamb 6b66650
remove completed todo comment
alamb 924155c
Move validation logic to `ShreddingState::try_new`
alamb 5cb1068
Apply suggestions from code review
alamb 9c22f85
cleanup
alamb eee199e
Avoid panics in production code
alamb 264356b
Merge remote-tracking branch 'apache/main' into alamb/variant_shreddi…
alamb 336a0a3
comments
alamb 016cee1
Merge remote-tracking branch 'apache/main' into alamb/variant_shreddi…
alamb 3e8820b
Fix up merge
alamb 55a7ae6
Update parquet-variant-compute/src/variant_array.rs
alamb 85e7194
Merge branch 'alamb/variant_shredding_strawman' of github.com:alamb/a…
alamb 6e38490
fix compilation
alamb 0ccd0f8
update docs
alamb f8b2df4
Rename ShreddingState to conform better to spec
alamb 2355247
Update name in output builder
alamb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Rename ShreddingState to conform better to spec
- Loading branch information
commit f8b2df44c2d77a1047de52766f3684672aeef5cf
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -80,8 +80,6 @@ impl VariantArray { | |||||||
| /// int8. | ||||||||
| /// | ||||||||
| /// Currently, only [`BinaryViewArray`] are supported. | ||||||||
| /// | ||||||||
| /// [`BinaryViewArray`]: arrow::array::BinaryViewArray | ||||||||
| pub fn try_new(inner: ArrayRef) -> Result<Self, ArrowError> { | ||||||||
| let Some(inner) = inner.as_struct_opt() else { | ||||||||
| return Err(ArrowError::InvalidArgumentError( | ||||||||
|
|
@@ -171,7 +169,7 @@ impl VariantArray { | |||||||
| ShreddingState::Unshredded { metadata, value } => { | ||||||||
| Variant::new(metadata.value(index), value.value(index)) | ||||||||
| } | ||||||||
| ShreddingState::FullyShredded { typed_value, .. } => { | ||||||||
| ShreddingState::Typed { typed_value, .. } => { | ||||||||
| if typed_value.is_null(index) { | ||||||||
| Variant::Null | ||||||||
| } else { | ||||||||
|
|
@@ -208,23 +206,45 @@ impl VariantArray { | |||||||
| } | ||||||||
| } | ||||||||
|
|
||||||||
| /// Variant arrays can be shredded in one of three states, encoded here | ||||||||
| /// Represents the shredding state of a [`VariantArray`] | ||||||||
| /// | ||||||||
| /// [`VariantArray`]s can be shredded according to the [Parquet Variant | ||||||||
| /// Shredding Spec]. Shredding means that the actual value is stored in a typed | ||||||||
| /// `typed_field` instead of the generic `value` field. | ||||||||
| /// | ||||||||
| /// Both value and typed_value are optional fields used together to encode a | ||||||||
| /// single value. Values in the two fields must be interpreted according to the | ||||||||
| /// following table (see [Parquet Variant Shredding Spec] for more details): | ||||||||
| /// | ||||||||
| /// | value | typed_value | Meaning | | ||||||||
| /// |----------|--------------|---------| | ||||||||
| /// | null | null | The value is missing; only valid for shredded object fields | | ||||||||
| /// | non-null | null | The value is present and may be any type, including `null` | | ||||||||
| /// | null | non-null | The value is present and is the shredded type | | ||||||||
| /// | non-null | non-null | The value is present and is a partially shredded object | | ||||||||
| /// | ||||||||
| /// [Parquet Variant Shredding Spec]: https://github.com/apache/parquet-format/blob/master/VariantShredding.md#value-shredding | ||||||||
| #[derive(Debug)] | ||||||||
| pub enum ShreddingState { | ||||||||
| // TODO: add missing state where there is neither value nor typed_value | ||||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| // Missing { metadata: BinaryViewArray }, | ||||||||
| /// This variant has no typed_value field | ||||||||
| Unshredded { | ||||||||
| metadata: BinaryViewArray, | ||||||||
| value: BinaryViewArray, | ||||||||
| }, | ||||||||
| /// This variant has a typed_value field and no value field | ||||||||
| /// meaning it is fully shredded (aka the value is stored in typed_value) | ||||||||
| FullyShredded { | ||||||||
| /// meaning it is the shredded type | ||||||||
| Typed { | ||||||||
| metadata: BinaryViewArray, | ||||||||
| typed_value: ArrayRef, | ||||||||
| }, | ||||||||
| /// This variant has both a value field and a typed_value field | ||||||||
| /// meaning it is partially shredded: first the typed_value is used, and | ||||||||
| /// if that is null, the value field is used. | ||||||||
| /// Partially shredded: | ||||||||
| /// * value is an object | ||||||||
| /// * typed_value is a shredded object. | ||||||||
| /// | ||||||||
| /// Note the spec says "Writers must not produce data where both value and | ||||||||
| /// typed_value are non-null, unless the Variant value is an object." | ||||||||
| PartiallyShredded { | ||||||||
| metadata: BinaryViewArray, | ||||||||
| value: BinaryViewArray, | ||||||||
|
|
@@ -246,7 +266,7 @@ impl ShreddingState { | |||||||
| typed_value, | ||||||||
| }), | ||||||||
| (metadata, Some(value), None) => Ok(Self::Unshredded { metadata, value }), | ||||||||
| (metadata, None, Some(typed_value)) => Ok(Self::FullyShredded { | ||||||||
| (metadata, None, Some(typed_value)) => Ok(Self::Typed { | ||||||||
| metadata, | ||||||||
| typed_value, | ||||||||
| }), | ||||||||
|
|
@@ -260,7 +280,7 @@ impl ShreddingState { | |||||||
| pub fn metadata_field(&self) -> &BinaryViewArray { | ||||||||
| match self { | ||||||||
| ShreddingState::Unshredded { metadata, .. } => metadata, | ||||||||
| ShreddingState::FullyShredded { metadata, .. } => metadata, | ||||||||
| ShreddingState::Typed { metadata, .. } => metadata, | ||||||||
| ShreddingState::PartiallyShredded { metadata, .. } => metadata, | ||||||||
| } | ||||||||
| } | ||||||||
|
|
@@ -269,7 +289,7 @@ impl ShreddingState { | |||||||
| pub fn value_field(&self) -> Option<&BinaryViewArray> { | ||||||||
| match self { | ||||||||
| ShreddingState::Unshredded { value, .. } => Some(value), | ||||||||
| ShreddingState::FullyShredded { .. } => None, | ||||||||
| ShreddingState::Typed { .. } => None, | ||||||||
| ShreddingState::PartiallyShredded { value, .. } => Some(value), | ||||||||
| } | ||||||||
| } | ||||||||
|
|
@@ -278,7 +298,7 @@ impl ShreddingState { | |||||||
| pub fn typed_value_field(&self) -> Option<&ArrayRef> { | ||||||||
| match self { | ||||||||
| ShreddingState::Unshredded { .. } => None, | ||||||||
| ShreddingState::FullyShredded { typed_value, .. } => Some(typed_value), | ||||||||
| ShreddingState::Typed { typed_value, .. } => Some(typed_value), | ||||||||
| ShreddingState::PartiallyShredded { typed_value, .. } => Some(typed_value), | ||||||||
| } | ||||||||
| } | ||||||||
|
|
@@ -290,10 +310,10 @@ impl ShreddingState { | |||||||
| metadata: metadata.slice(offset, length), | ||||||||
| value: value.slice(offset, length), | ||||||||
| }, | ||||||||
| ShreddingState::FullyShredded { | ||||||||
| ShreddingState::Typed { | ||||||||
| metadata, | ||||||||
| typed_value, | ||||||||
| } => ShreddingState::FullyShredded { | ||||||||
| } => ShreddingState::Typed { | ||||||||
| metadata: metadata.slice(offset, length), | ||||||||
| typed_value: typed_value.slice(offset, length), | ||||||||
| }, | ||||||||
|
|
||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future
FullyShreddedAllNullvariant (neithervaluenortyped_valuepresent), would we still need to store themetadataeven tho it's never actually used? 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure
I filed a ticket to track adding AllNull:
ShreddingState::AllNullvariant #8088