-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] Define and use unshred_variant function #8481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CC @alamb @liamzwbao |
Reverting to draft temporarily, there are a bunch of changes in flight. |
This looks great -- thanks @scovich I will review the other PRs first Marking this as a draft per your other comment |
Update: Once all of the following prefactoring PR merge, I can rebase this PR and it will be ready for review:
I also have variant array unshredding support working locally, ~100LoC, which can either go into this PR or become a stacked PR on top. |
2ad9a36
to
0662e12
Compare
This should be ready for final review. List support will stack on top very shortly. |
@alamb one question -- Once this change lands, should we delete the unshredding code from
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 🏆 -- I think this approach is super elegant @scovich -- very nicely done
I think this is a much clearer mechanism than trying to unshred in vale
@alamb one question -- Once this change lands, should we delete the unshredding code from VariantArray::value, so that it can only work with unshredded binary variant?
I think this is my preference for now. My rationale is that with unshred_variant
there is now a way to get a Variant to work with if needed. And it has the nice property that it will work naturally with variant_get
❤️
If it turns out there is an important usecase for a row level unshredder (aka what value()
does today) we can also always add it back in
Overlap with cast_to_variant implementation
It seems to me like the unshred_variant
in this PR subsumes the code in cast_to_variant -- we could probably switch cast_to_variant to create a "dummy" variant array and call into unshred_variant
🤔
An optimization for another PR though I think
cc @liamzwbao @klion26 and @codephage2020
If we merge this, I think we should file follow on tickets for:
|
There's definitely some overlap and I may have missed some opportunities for builder code reuse, but the two functions are not doing the same thing:
|
I didn't mean to imply this PR should be changed
That is true, but I think most of the arrow types that are converted are primitive types (and thus have no metadata to convert -- the output array is just the same metadata (0 byte) repeated over and over
🤔 that is a good point |
I think now that #8519 has merged, it should be straightforward to coalesce some of the similar code between the two modules. Definitely a follow-up item tho. |
🚀 |
* NOTE: Stacked on #8481, ignore the first commit when reviewing. # Which issue does this PR close? - Closes #8337 # Rationale for this change Add a missing feature. # What changes are included in this PR? Leveraging the recently added `ListLikeArray` trait, support all five list types when unshredding variant data. # Are these changes tested? Yes -- all the list-related variant shredding integration tests pass now. # Are there any user-facing changes? No.
Which issue does this PR close?
Struct
#8336Time64(Microsecond)
#8334Rationale for this change
The
VariantArray::value
method was really inefficient but had quietly crept into important code paths likevariant_get
. Since the complex and inefficient code was in support of variant unshredding, we should just add and use a properunshred_variant
function (which uses row builders like the other variant manipulating functions).What changes are included in this PR?
unshred_variant
function, which does what it says. It supports all the typestyped_value_to_variant
supported, plus Time64 and Struct as a bonus. The former because it was ~10LoC and the latter because it demonstrates the superiority of this new approach vs. e.g. [Variant] VariantArray::value supports shredded struct access #8446variant_get
unshredding path to call it, which immediately benefits from all that function's existing test coverage.unshred_variant
instead of looping over rows callingvalue(i)
.Are these changes tested?
Yes, a bunch of variant integration tests now pass that used to fail.
Are there any user-facing changes?
Several new pub methods. I don't think any breaking changes.