Skip to content

_FillValue / NetCDF parser encoding error #785

@norlandrhagen

Description

@norlandrhagen

I'm trying to Virtualize a NetCDF file on s3 and running into a _FillValue encoding issue. In the encode_cf_fill_value function,
Xarray's FillValueCoder is called: FillValueCoder.encode(fillvalue, target_dtype), where throws an AssertionError.

I dug in a bit and the dataset has two variables that seem to be causing this issue, date_written and time_written. They both have fill values b'-' np.bytes_(b'-').

Wondering if anyone has ran into anything like this before.

MRE:

import xarray as xr
from obstore.store import from_url
from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from xarray.backends.zarr import FillValueCoder

bucket = "s3://ncar-cesm2-arise/"
path = "ARISE-SAI-1.5/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001/lnd/proc/tseries/day_1/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001.clm2.h3.RAIN.20350101-20691231.nc"
url = f"{bucket}/{path}"
store = from_url(bucket, region="us-east-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket: store})

vds = open_virtual_dataset(
  url=file_path,
  parser=parser,
  registry=registry)

zarr-python: 3.1.1
xarray: 2025.8.0
vz: sha: 7638893

In `virtualizarr/parsers/utils.py:39, in encode_cf_fill_value(fill_value, target_dtype)`
`encoded_fillvalue = FillValueCoder.encode(fillvalue, target_dtype)`

File ~/Documents/carbonplan/VirtualiZarr/virtualizarr/parsers/utils.py:39, in encode_cf_fill_value(fill_value, target_dtype)
     37 else:
     38     fillvalue = fill_value
---> 39 encoded_fillvalue = FillValueCoder.encode(fillvalue, target_dtype)
     40 return encoded_fillvalue

File ~/Documents/carbonplan/VirtualiZarr/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py:127, in FillValueCoder.encode(cls, value, dtype)
    123 @classmethod
    124 def encode(cls, value: int | float | str | bytes, dtype: np.dtype[Any]) -> Any:
    125     if dtype.kind in "S":
    126         # byte string, this implies that 'value' must also be `bytes` dtype.
--> 127         assert isinstance(value, bytes)
    128         return base64.standard_b64encode(value).decode()
    129     elif dtype.kind in "b":
    130         # boolean

AssertionError:

Metadata

Metadata

Assignees

No one assigned

    Labels

    HDF parserNon-kerchunk-based HDF parserbugSomething isn't workingparsers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions