Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 11, 2025

Rationale for this change

The chunked_arrays hypothesis strategy had a workaround that excluded struct types with the assumption that field metadata is not preserved (added from dd0988b).

Testing confirms that field metadata is now correctly preserved in chunked arrays with struct types, so the workaround is no longer necessary, and it is fixed by d06c664

Now it explicitly calls CChunkedArray::Make() instead of manual construction of CChunkedArray.

What changes are included in this PR?

Remove the assumption that field metadata is not preserved.

Are these changes tested?

Manually tested the creation of metadata (generated by ChatGPT)

import sys
import pyarrow as pa

# Create a struct type with custom field metadata
struct_type = pa.struct([
    pa.field('a', pa.int32(), metadata={'custom_key': 'custom_value_a', 'description': 'field a'}),
    pa.field('b', pa.string(), metadata={'custom_key': 'custom_value_b', 'description': 'field b'})
])

print("=== Original struct type ===")
print(f"Type: {struct_type}")
print(f"Field 'a' metadata: {struct_type[0].metadata}")
print(f"Field 'b' metadata: {struct_type[1].metadata}")
print()

# Create arrays with this struct type
arr1 = pa.array([
    {'a': 1, 'b': 'foo'},
    {'a': 2, 'b': 'bar'}
], type=struct_type)

arr2 = pa.array([
    {'a': 3, 'b': 'baz'},
    {'a': 4, 'b': 'qux'}
], type=struct_type)

print("=== Individual arrays ===")
print(f"arr1.type: {arr1.type}")
print(f"arr1.type[0].metadata: {arr1.type[0].metadata}")
print(f"arr2.type: {arr2.type}")
print(f"arr2.type[0].metadata: {arr2.type[0].metadata}")
print()

# Create chunked array WITH explicit type parameter (preserves metadata)
chunked_with_type = pa.chunked_array([arr1, arr2], type=struct_type)

print("=== Chunked array (with explicit type) ===")
print(f"Type: {chunked_with_type.type}")
print(f"Field 'a' metadata: {chunked_with_type.type[0].metadata}")
print(f"Field 'b' metadata: {chunked_with_type.type[1].metadata}")
print()

# Verify metadata is preserved
if (chunked_with_type.type[0].metadata == struct_type[0].metadata and
    chunked_with_type.type[1].metadata == struct_type[1].metadata):
    print("✓ SUCCESS: Field metadata IS preserved!")
    print(f"  Field 'a': {dict(chunked_with_type.type[0].metadata)}")
    print(f"  Field 'b': {dict(chunked_with_type.type[1].metadata)}")
    exit_code = 0
else:
    print("✗ FAILED: Field metadata was lost")
    exit_code = 1

print()
print("=== Test without explicit type (for comparison) ===")
# What happens without explicit type? (inferred from first chunk)
chunked_without_type = pa.chunked_array([arr1, arr2])
print(f"Type: {chunked_without_type.type}")
print(f"Field 'a' metadata: {chunked_without_type.type[0].metadata}")
print(f"Field 'b' metadata: {chunked_without_type.type[1].metadata}")

if chunked_without_type.type[0].metadata == struct_type[0].metadata:
    print("  → Metadata preserved even without explicit type (from first chunk)")
else:
    print("  → Note: Even without explicit type, metadata is preserved from first chunk")

Are there any user-facing changes?

No, test-only.

@github-actions
Copy link

⚠️ GitHub issue #48442 has been automatically assigned in GitHub to PR creator.

@raulcd
Copy link
Member

raulcd commented Dec 11, 2025

@github-actions crossbow submit test-conda-python-3.11-hypothesis

@github-actions
Copy link

Revision: 1c29350

Submitted crossbow builds: ursacomputing/crossbow @ actions-dd158bff76

Task Status
test-conda-python-3.11-hypothesis GitHub Actions

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HyukjinKwon for your PRs this is really appreciated. Our current review capacity is rather small so we might take some time to go over them!
I know is not related to:

But would be nice to have a passing hypothesis CI job before merging changes to hypothesis tests :) Those have been failing for a couple of months now

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Dec 11, 2025
@HyukjinKwon
Copy link
Member Author

It's rather a bandaid fix but #48460 should fix it! 👍

@HyukjinKwon
Copy link
Member Author

@github-actions crossbow submit test-conda-python-3.11-hypothesis

@github-actions
Copy link

Revision: 1c29350

Submitted crossbow builds: ursacomputing/crossbow @ actions-bc8ad81dd8

Task Status
test-conda-python-3.11-hypothesis GitHub Actions

@HyukjinKwon
Copy link
Member Author

@github-actions crossbow submit test-conda-python-3.11-hypothesis

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 12, 2025
@github-actions
Copy link

Revision: ec6acb2

Submitted crossbow builds: ursacomputing/crossbow @ actions-768e6f52a4

Task Status
test-conda-python-3.11-hypothesis GitHub Actions

@HyukjinKwon
Copy link
Member Author

Pushed again to retrigger the test. hyphothsis build itself passes (#48443 (comment))

@HyukjinKwon
Copy link
Member Author

Seems like:

tests/test_extension_type.py .................                           [ 40%]
Fatal Python error: Segmentation fault

Current thread 0x0000000203059040 (most recent call first):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/test_fs.py", line 1224 in test_s3_options
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 1720 in runtest
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 245 in <lambda>
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 353 in from_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 244 in call_and_report
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 137 in runtestprotocol
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 372 in _main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 318 in wrap_session
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 199 in main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 223 in console_main
  File "/Users/runner/hostedtoolcache/Python/3.11.9/arm64/bin/pytest", line 7 in <module>
tests/test_fs.py ....sssx.xsss....sssx.xssss

Failure at MacOS is globally happening. I retriggered but still the issue persists. Let me leave it as is for now - it won't be related to my change in any event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants