Skip to content

Conversation

@nathaniel-d-ef
Copy link
Contributor

@nathaniel-d-ef nathaniel-d-ef commented Oct 3, 2025

Which issue does this PR close?

Related to: #4886 (“Add Avro Support”)

Rationale for this change

Prior to this PR, the crate lacked sufficient support for named types.

This PR introduces fixes to the Avro reader and writer to ensure correct and robust roundtrip serialization of complex union types. The core issue was that the previous implementation failed to properly distinguish between logically distinct types within a union if they shared the same physical representation. This fundamental flaw led to valid union schemas/data being flagged as invalid, the loss of specific names of named type branches (e.g., "Fx8" becoming "fixed").

This PR makes a change whereby name and namespace data is registered and retrieved from metadata, which ensures that complex Avro unions can now be reliably read, converted to Arrow, and written back to Avro without validation errors or loss of type information.

This behavior will be further validated in a follow-up PR to add support for writing dense unions.

What changes are included in this PR?

  • A solution to the issue described above, whereby name and namespace metadata is registered and retrieved to support correct naming.
  • Test refactor to validate the behavior, including some reworking of TDD patterns to support the additional assertion data.
  • A new test file and round-trip test to validate the Avro paradigm of reuse of named types elsewhere in a schema (existing logic, code coverage missing until now).

Are these changes tested?

  • Yes, some existing tests were required to change to ensure assertions matched, as metadata is now included where it previously wasn't.
  • Additional test as mentioned relating to schema name references.

Are there any user-facing changes?

Crate not yet public

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Oct 3, 2025
@jecsand838 jecsand838 force-pushed the resolve_named_field branch from cdc38d4 to 72c2904 Compare October 4, 2025 20:18
@jecsand838
Copy link
Contributor

@mbrobbel @alamb If there's anyway this PR and #8550 can get into the 57.0.0 release that would be huge. After this there's just one last PR to add the remaining round trip tests. These two PRs will complete the remaining functionality though.

CC: @nathaniel-d-ef

@mbrobbel mbrobbel added this to the 57.0.0 milestone Oct 6, 2025
@jecsand838 jecsand838 force-pushed the resolve_named_field branch from f6bf96c to 6122ef9 Compare October 6, 2025 16:29
@jecsand838 jecsand838 force-pushed the resolve_named_field branch from a877ba5 to 45e816e Compare October 6, 2025 16:55
Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrobbel mbrobbel merged commit 2a8dfb3 into apache:main Oct 6, 2025
23 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 6, 2025

🚀

mbrobbel added a commit that referenced this pull request Oct 8, 2025
# Which issue does this PR close?

Relates to:
#8348
#4886

# Rationale for this change

This PR completes the efforts of @jecsand838, adding dense union support
to the encoder side of the crate, along with four other minor extensions
of existing time-related encoding.

Note: currently this PR is stacked behind
#8546. Once that's merged this
will be updated and will not include those changes.

# What changes are included in this PR?

- Dense union support for the writer
- Tests

# Are these changes tested?

- A full round-trip test, reading in an existing union avro file and
asserting that the output matches expectations
- Unit tests covering new encoders.

# Are there any user-facing changes?

Crate not yet public

---------

Co-authored-by: Connor Sanders <[email protected]>
Co-authored-by: Connor Sanders <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants