You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-53291][SQL] Fix nullability for value column
### What changes were proposed in this pull request?
For shredded Variant, we currently always set the `value` column to be nullable. But when there is no corresponding `typed_value`, and the value doesn't represent an object field (where null implies missing from the object), the `value` is never null, and we can set the column to be required.
### Why are the changes needed?
This shouldn't affect results as read by Spark, but it may cause the parquet file to be marginally larger, and the [spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md) wording indicates that `value` must be required in these situations, so a strict reader could reject the schema as it's currently being produced.
### Does this PR introduce _any_ user-facing change?
Variant parquet file schema may change slightly.
### How was this patch tested?
Unit test extended to cover this case.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#52043 from cashmand/fix_nullability.
Authored-by: cashmand <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
0 commit comments