Skip to content

Commit 41a5097

Browse files
authored
Merge pull request #25 from janelia-cellmap/access_semantics
add `to_flat`, `from_flat`, `like`, and better handling for existing arrays / groups
2 parents d9dc1a2 + 9fa6b61 commit 41a5097

File tree

12 files changed

+1765
-821
lines changed

12 files changed

+1765
-821
lines changed

.github/workflows/test.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
name: Linux Testing
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
build:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ['3.10', '3.11', '3.12']
15+
steps:
16+
- uses: actions/checkout@v4
17+
- name: Install dependencies
18+
shell: "bash -l {0}"
19+
run: |
20+
pip install poetry
21+
poetry install
22+
- name: Test
23+
run: |
24+
poetry run pytest

docs/api/v2.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: pydantic_zarr.v2

docs/api/v3.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: pydantic_zarr.v3

docs/usage_zarr_v2.md

Lines changed: 186 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,190 @@ print(ArraySpec.from_array(np.arange(10)).model_dump())
100100
}
101101
"""
102102
```
103+
### Flattening and unflattening Zarr hierarchies
104+
105+
In the previous section we built a model of a Zarr hierarchy by defining `GroupSpec` and `ArraySpec`
106+
instances, then providing those objects as `members` to the constructor of another `GroupSpec`. In
107+
other words, with this approach we create "child nodes" and give those nodes to the "parent node",
108+
recursively.
109+
110+
Constructing deeply nested hierarchies this way can be tedious.
111+
For this reason, `pydantic-zarr` supports an alternative representation of the Zarr
112+
hierarchy in the form of a dictionary with `str` keys and `ArraySpec` / `GroupSpec` values, and
113+
methods to convert to / from these dictionaries.
114+
115+
#### Making a `GroupSpec` object from a flat hierarchy
116+
117+
This example demonstrates how to create a `GroupSpec` from a `dict` representation of a Zarr hierarchy.
118+
119+
```python
120+
from pydantic_zarr.v2 import GroupSpec, ArraySpec
121+
# other than the key representing the root path "",
122+
# the keys must be valid paths in the Zarr storage hierarchy
123+
# note that the `members` attribute is `None` for the `GroupSpec` instances in this `dict`.
124+
tree = {
125+
"": GroupSpec(members=None, attributes={"root": True}),
126+
"/a": GroupSpec(members=None, attributes={"root": False}),
127+
"/a/b": ArraySpec(shape=(10,10), dtype="uint8", chunks=(1,1))
128+
}
129+
130+
print(GroupSpec.from_flat(tree).model_dump())
131+
"""
132+
{
133+
'zarr_version': 2,
134+
'attributes': {'root': True},
135+
'members': {
136+
'a': {
137+
'zarr_version': 2,
138+
'attributes': {'root': False},
139+
'members': {
140+
'b': {
141+
'zarr_version': 2,
142+
'attributes': {},
143+
'shape': (10, 10),
144+
'chunks': (1, 1),
145+
'dtype': '|u1',
146+
'fill_value': 0,
147+
'order': 'C',
148+
'filters': None,
149+
'dimension_separator': '/',
150+
'compressor': None,
151+
}
152+
},
153+
}
154+
},
155+
}
156+
"""
157+
```
158+
159+
#### flattening `GroupSpec` objects
160+
161+
This is similar to the example above, except that we are working in reverse -- we are making the
162+
flat `dict` from the `GroupSpec` object.
163+
164+
```python
165+
from pydantic_zarr.v2 import GroupSpec, ArraySpec
166+
# other than the key representing the root path "",
167+
# the keys must be valid paths in the Zarr storage hierarchy
168+
# note that the `members` attribute is `None` for the `GroupSpec` instances in this `dict`.
169+
170+
a_b = ArraySpec(shape=(10,10), dtype="uint8", chunks=(1,1))
171+
a = GroupSpec(members={'b': a_b}, attributes={"root": False})
172+
root = GroupSpec(members={'a': a}, attributes={"root": True})
173+
174+
print(root.to_flat())
175+
"""
176+
{
177+
'': GroupSpec(zarr_version=2, attributes={'root': True}, members=None),
178+
'/a': GroupSpec(zarr_version=2, attributes={'root': False}, members=None),
179+
'/a/b': ArraySpec(
180+
zarr_version=2,
181+
attributes={},
182+
shape=(10, 10),
183+
chunks=(1, 1),
184+
dtype='|u1',
185+
fill_value=0,
186+
order='C',
187+
filters=None,
188+
dimension_separator='/',
189+
compressor=None,
190+
),
191+
}
192+
"""
193+
```
194+
195+
#### Implicit groups
196+
`zarr-python` supports creating Zarr arrays or groups deep in the
197+
hierarchy without explicitly creating the intermediate groups first.
198+
`from_flat` models this behavior. For example, `{'/a/b/c': ArraySpec(...)}` implicitly defines the existence of a groups named `a` and `b` (which is contained in `a`). `from_flat` will create the expected `GroupSpec` object from such `dict` instances.
199+
200+
```python
201+
from pydantic_zarr.v2 import GroupSpec, ArraySpec
202+
tree = {'/a/b/c': ArraySpec(shape=(1,), dtype='uint8', chunks=(1,))}
203+
print(GroupSpec.from_flat(tree).model_dump())
204+
"""
205+
{
206+
'zarr_version': 2,
207+
'attributes': {},
208+
'members': {
209+
'a': {
210+
'zarr_version': 2,
211+
'attributes': {},
212+
'members': {
213+
'b': {
214+
'zarr_version': 2,
215+
'attributes': {},
216+
'members': {
217+
'c': {
218+
'zarr_version': 2,
219+
'attributes': {},
220+
'shape': (1,),
221+
'chunks': (1,),
222+
'dtype': '|u1',
223+
'fill_value': 0,
224+
'order': 'C',
225+
'filters': None,
226+
'dimension_separator': '/',
227+
'compressor': None,
228+
}
229+
},
230+
}
231+
},
232+
}
233+
},
234+
}
235+
"""
236+
```
237+
238+
## Comparing `GroupSpec` and `ArraySpec` models
239+
240+
`GroupSpec` and `ArraySpec` both have `like` methods that take another `GroupSpec` or `ArraySpec` as an argument and return `True` (the models are like each other) or `False` (the models are not like each other).
241+
242+
The `like` method works by converting both input models to `dict` via `pydantic.BaseModel.model_dump`, and comparing the `dict` representation of the models. This means that instances of two different subclasses of `GroupSpec`, which would not be considered equal according to the `==` operator, will be considered `like` if and only if they serialize to identical `dict` instances.
243+
244+
The `like` method also takes keyword arguments `include` and `exclude`, which results in attributes being explicitly included or excluded from the model comparison. So it's possible to use `like` to check if two `ArraySpec` instances have the same `shape` and `dtype` by calling `array_a.like(array_b, include={'shape', 'dtype'})`. This is useful if you don't care about the compressor or filters and just want to ensure that you can safely write an in-memory array to a Zarr array.
245+
246+
```python
247+
from pydantic_zarr.v2 import ArraySpec, GroupSpec
248+
import zarr
249+
arr_a = ArraySpec(shape=(1,), dtype='uint8', chunks=(1,))
250+
arr_b = ArraySpec(shape=(2,), dtype='uint8', chunks=(1,)) # array with different shape
251+
252+
print(arr_a.like(arr_b)) # False, because of mismatched shape
253+
#> False
254+
255+
print(arr_a.like(arr_b, exclude={'shape'})) # True, because we exclude shape.
256+
#> True
257+
258+
# `ArraySpec.like` will convert a zarr.Array to ArraySpec
259+
store = zarr.MemoryStore()
260+
arr_a_stored = arr_a.to_zarr(store, path='arr_a') # this is a zarr.Array
261+
262+
print(arr_a.like(arr_a_stored)) # arr_a is like the zarr.Array version of itself
263+
#> True
264+
265+
print(arr_b.like(arr_a_stored)) # False, because of mismatched shape
266+
#> False
267+
268+
print(arr_b.like(arr_a_stored, exclude={'shape'})) # True, because we exclude shape.
269+
#> True
270+
271+
# the same thing thing for groups
272+
g_a = GroupSpec(attributes={'foo': 10}, members={'a': arr_a, 'b': arr_b})
273+
g_b = GroupSpec(attributes={'foo': 11}, members={'a': arr_a, 'b': arr_b})
274+
275+
print(g_a.like(g_a)) # g_a is like itself
276+
#> True
277+
278+
print(g_a.like(g_b)) # False, because of mismatched attributes
279+
#> False
280+
281+
print(g_a.like(g_b, exclude={'attributes'})) # True, because we ignore attributes
282+
#> True
283+
284+
print(g_a.like(g_a.to_zarr(store, path='g_a'))) # g_a is like its zarr.Group counterpart
285+
#> True
286+
```
103287

104288
## Using generic types
105289

@@ -132,7 +316,7 @@ except ValidationError as exc:
132316
1 validation error for GroupSpec[GroupAttrs, ~TItem]
133317
attributes.b
134318
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='foo', input_type=str]
135-
For further information visit https://errors.pydantic.dev/2.4/v/int_parsing
319+
For further information visit https://errors.pydantic.dev/2.6/v/int_parsing
136320
"""
137321

138322
# this passes validation
@@ -151,7 +335,7 @@ except ValidationError as exc:
151335
1 validation error for GroupSpec[~TAttr, ArraySpec]
152336
members.foo
153337
Input should be a valid dictionary or instance of ArraySpec [type=model_type, input_value=GroupSpec(zarr_version=2,...tributes={}, members={}), input_type=GroupSpec]
154-
For further information visit https://errors.pydantic.dev/2.4/v/model_type
338+
For further information visit https://errors.pydantic.dev/2.6/v/model_type
155339
"""
156340

157341
# this passes validation

mkdocs.yaml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,23 @@ nav:
3030
- About: index.md
3131
- Usage (Zarr V3): usage_zarr_v3.md
3232
- Usage (Zarr V2): usage_zarr_v2.md
33-
- API: api/core.md
33+
- API:
34+
- core: api/core.md
35+
- v2: api/v2.md
36+
- v3: api/v3.md
3437

3538
plugins:
3639
- mkdocstrings:
3740
handlers:
3841
python:
3942
options:
40-
show_signature_annotations: true
43+
docstring_style: numpy
44+
members_order: source
45+
separate_signature: true
46+
filters: ["!^_"]
47+
docstring_options:
48+
ignore_init_summary: true
49+
merge_init_into_class: true
4150

4251
markdown_extensions:
4352
- pymdownx.highlight:

0 commit comments

Comments
 (0)