-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Labels
HDF parserNon-kerchunk-based HDF parserNon-kerchunk-based HDF parserbugSomething isn't workingSomething isn't workingparsers
Description
I'm trying to Virtualize a NetCDF file on s3 and running into a _FillValue encoding issue. In the encode_cf_fill_value function,
Xarray's FillValueCoder is called: FillValueCoder.encode(fillvalue, target_dtype), where throws an AssertionError.
I dug in a bit and the dataset has two variables that seem to be causing this issue, date_written and time_written. They both have fill values b'-' np.bytes_(b'-').
Wondering if anyone has ran into anything like this before.
MRE:
import xarray as xr
from obstore.store import from_url
from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from xarray.backends.zarr import FillValueCoder
bucket = "s3://ncar-cesm2-arise/"
path = "ARISE-SAI-1.5/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001/lnd/proc/tseries/day_1/b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-DEFAULT.001.clm2.h3.RAIN.20350101-20691231.nc"
url = f"{bucket}/{path}"
store = from_url(bucket, region="us-east-2", skip_signature=True)
registry = ObjectStoreRegistry({bucket: store})
vds = open_virtual_dataset(
url=file_path,
parser=parser,
registry=registry)zarr-python: 3.1.1
xarray: 2025.8.0
vz: sha: 7638893
In `virtualizarr/parsers/utils.py:39, in encode_cf_fill_value(fill_value, target_dtype)`
`encoded_fillvalue = FillValueCoder.encode(fillvalue, target_dtype)`
File ~/Documents/carbonplan/VirtualiZarr/virtualizarr/parsers/utils.py:39, in encode_cf_fill_value(fill_value, target_dtype)
37 else:
38 fillvalue = fill_value
---> 39 encoded_fillvalue = FillValueCoder.encode(fillvalue, target_dtype)
40 return encoded_fillvalue
File ~/Documents/carbonplan/VirtualiZarr/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py:127, in FillValueCoder.encode(cls, value, dtype)
123 @classmethod
124 def encode(cls, value: int | float | str | bytes, dtype: np.dtype[Any]) -> Any:
125 if dtype.kind in "S":
126 # byte string, this implies that 'value' must also be `bytes` dtype.
--> 127 assert isinstance(value, bytes)
128 return base64.standard_b64encode(value).decode()
129 elif dtype.kind in "b":
130 # boolean
AssertionError:Metadata
Metadata
Assignees
Labels
HDF parserNon-kerchunk-based HDF parserNon-kerchunk-based HDF parserbugSomething isn't workingSomething isn't workingparsers