-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Hello everyone,
I'm trying to use virtualizarr version 2.0.1. to virtualize the access to some netCDF data files stored on an Azure storage container, but the creation of virtual datasets fails when resolving the remote urls.
The code below is a minimal example to create a virtual dataset for a single netCDF file, raising the same error.
The remote file is accessible from the Azure storage container, and the remote url to the file is correctly resolved if registry.resolve is run out of open_virtual_dataset. When executing open_virtual_dataset to virtualise the same file, the url is mapped to local storage of the compute instance where the code is executed, which does not exist, making the resolve function fail.
code snippet with results (details are removed)
import os
import sys
import fsspec
import glob
import adlfs
import obstore as obs
from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers.hdf import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
bucket = "abfs://"+os.environ["AZURE_STORAGE_CONTAINER"] # env variable for Azure storage container
store = obs.store.from_url(bucket, account_name=os.environ["AZURE_STORAGE_ACCOUNT"],skip_signature=True) # env variable for Azure storage account
parser = HDFParser()
registry = ObjectStoreRegistry({f"{bucket}": store})
f_url = f'abfs://<my_azure_storage_container>/<remote_path_to_netcdf_file>'
registry.resolve(url=f_url)
The remote url to the file is correctly resolved by the instruction above
AzureStore(container_name="<my_azure_storage_container>", account_name="<my_azure_storage_account>"),
'<remote_path_to_netcdf_file>'
but not inside open_virtual_dataset
vds = open_virtual_dataset(
url=_url,
parser=parser,
registry=registry,
loadable_variables=[],
)
which raises this error
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[...], line 1
----> 1 vds = open_virtual_dataset(
2 url=_url,
3 parser=parser,
4 registry=registry,
5 loadable_variables=[],
6 )
File [[...]/lib/python3.12/site-packages/virtualizarr/xarray.py:87](https://[...]lib/python3.12/site-packages/virtualizarr/xarray.py#line=86), in open_virtual_dataset(url, registry, parser, drop_variables, loadable_variables, decode_times)
45 """
46 Open an archival data source as an [xarray.Dataset][] wrapping virtualized zarr arrays.
47
(...) 83 in `loadable_variables` and normal lazily indexed arrays for each variable in `loadable_variables`.
84 """
85 filepath = validate_and_normalize_path_to_uri(url, fs_root=Path.cwd().as_uri())
---> 87 manifest_store = parser(
88 url=filepath,
89 registry=registry,
90 )
92 ds = manifest_store.to_virtual_dataset(
93 loadable_variables=loadable_variables,
94 decode_times=decode_times,
95 )
96 return ds.drop_vars(list(drop_variables or ()))
File [[...]lib/python3.12/site-packages/virtualizarr/parsers/hdf/hdf.py:168]([...]lib/python3.12/site-packages/virtualizarr/parsers/hdf/hdf.py#line=167), in HDFParser.__call__(self, url, registry)
147 def __call__(
148 self,
149 url: str,
150 registry: ObjectStoreRegistry,
151 ) -> ManifestStore:
152 """
153 Parse the metadata and byte offsets from a given HDF5[/NetCDF4]([...]NetCDF4) file to produce a VirtualiZarr
154 [ManifestStore][virtualizarr.manifests.ManifestStore].
(...) 166 A [ManifestStore][virtualizarr.manifests.ManifestStore] which provides a Zarr representation of the parsed file.
167 """
--> 168 store, path_in_store = registry.resolve(url)
169 reader = ObstoreReader(store=store, path=path_in_store)
170 manifest_group = _construct_manifest_group(
171 filepath=url,
172 reader=reader,
173 group=self.group,
174 drop_variables=self.drop_variables,
175 )
File [[...]lib/python3.12/site-packages/virtualizarr/registry.py:264]([...]lib/python3.12/site-packages/virtualizarr/registry.py#line=263), in ObjectStoreRegistry.resolve(self, url)
262 path_after_prefix = path.lstrip("[/](https://[...].azureml.ms/)")
263 return store, path_after_prefix
--> 264 raise ValueError(f"Could not find an ObjectStore matching the url `{url}`")
ValueError: Could not find an ObjectStore matching the url `[file:///mnt/batch/tasks/shared/LS_root/mounts/clusters/<path-to-local-storage-of-compute-instance>/abfs%3A/<my_azure_storage_container>/<remote_path_to_netcdf_file>`](file:///mnt/batch/tasks/shared/LS_root/mounts/clusters/<path-to-local-storage-of-compute-instance>/abfs%3A/<my_azure_storage_container>/<remote_path_to_netcdf_file>%60)
Any comment or suggestion is much appreciated.
Thank you!!