-
Notifications
You must be signed in to change notification settings - Fork 526
feat: add UDFs for json #4577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add UDFs for json #4577
Conversation
Signed-off-by: Xuanwo <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4577 +/- ##
==========================================
- Coverage 80.99% 80.85% -0.15%
==========================================
Files 314 314
Lines 115727 116609 +882
Branches 115727 116609 +882
==========================================
+ Hits 93735 94283 +548
- Misses 18682 18952 +270
- Partials 3310 3374 +64
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick initial question. How do these compare to https://github.com/datafusion-contrib/datafusion-functions-json ?
json_get::json_get_udf(),
json_get_bool::json_get_bool_udf(),
json_get_float::json_get_float_udf(),
json_get_int::json_get_int_udf(),
json_get_json::json_get_json_udf(),
json_get_array::json_get_array_udf(),
json_as_text::json_as_text_udf(),
json_get_str::json_get_str_udf(),
json_contains::json_contains_udf(),
json_length::json_length_udf(),
json_object_keys::json_object_keys_udf(),Perhaps it would be better to implement them directly on |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, I have some suggestions but no major concerns.
| .iter() | ||
| .map(|arg| match arg { | ||
| ColumnarValue::Array(arr) => arr.clone(), | ||
| ColumnarValue::Scalar(scalar) => scalar.to_array().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could avoid broadcasting the scalars here. Let's create a follow-up?
| match raw_jsonb.get_by_name(key, false) { | ||
| Ok(Some(value)) => return Ok(Some(value.as_raw().as_ref().to_vec())), | ||
| Ok(None) => {} | ||
| Err(e) => { | ||
| return Err(datafusion::error::DataFusionError::Execution(format!( | ||
| "Failed to get field: {}", | ||
| e | ||
| ))) | ||
| } | ||
| } | ||
|
|
||
| // Try as array index | ||
| if let Ok(index) = key.parse::<usize>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more efficient to determine if the key is a field or an integer once instead of each value? Although I guess it could be an integer-named field and so vary from value to value 😛
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
|
Catching some bugs from python side, working on it. |
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
|
I really don't like how I'm handling JSON and JSONB conventions in the code. Maybe I over-optimized too early, adding unnecessary complexity just to allow we can perform UDFs on JSONB instead of the entire JSON. I'm starting to reconsider @westonpace's suggestion to keep JSONB as part of the file format instead of treating it as a logical type. While it's working for now, we might as well keep moving forward and see how it holds up in the long run. We can always make adjustments later. I'll merge this PR first and do anyother changes in follow-up PRs. This will also unlock us to implement index over JSON. |
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
|
The follow-up tracked at #4595 |
Part of #4516
This PR will add UDFs for JSON so users can read JSON like:
This PR was primarily authored with Claude Code using Opus 4.1 and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.