Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
faa5ef5
feat: add IndexTransform library for composable, lazy coordinate mapp…
d-v-b Apr 14, 2026
732dddd
Merge branch 'main' into refactor/simplify-indexing
d-v-b Apr 14, 2026
79cd9c8
feat: add JSON serialization for IndexTransform (TensorStore-compatible)
d-v-b Apr 14, 2026
db31317
Merge branch 'main' into refactor/simplify-indexing
d-v-b Apr 15, 2026
797b25e
Merge branch 'main' into refactor/simplify-indexing
d-v-b Apr 16, 2026
9f50a4c
Merge branch 'main' into refactor/simplify-indexing
d-v-b Apr 21, 2026
53c0042
Merge branch 'main' into refactor/simplify-indexing
d-v-b Apr 22, 2026
1477f92
Merge branch 'main' into refactor/simplify-indexing
d-v-b May 6, 2026
e3927be
Merge branch 'main' of https://github.com/zarr-developers/zarr-python…
d-v-b May 6, 2026
2989756
Merge branch 'main' into refactor/simplify-indexing
d-v-b May 11, 2026
ed9167c
Merge branch 'main' into refactor/simplify-indexing
d-v-b May 18, 2026
0cf74f9
Merge branch 'main' into refactor/simplify-indexing
d-v-b May 19, 2026
ce78f48
rename accessor to lazy, and use result instead of resolve
d-v-b May 19, 2026
6bd22dc
Merge branch 'main' into refactor/simplify-indexing
d-v-b Jun 8, 2026
93e8c94
perf: cache Array shape/ndim and reuse chunk grid in transform resolvers
d-v-b Jun 29, 2026
ab795ca
perf: skip the transform resolver for eager (identity-transform) inde…
d-v-b Jun 29, 2026
b43023a
perf(lazy): resolve chunk_grid[coords] once per chunk in transform re…
d-v-b Jun 29, 2026
32cd3af
perf(lazy): single-pass sub_transform_to_selections with hoisted doma…
d-v-b Jun 29, 2026
f16ce13
fix(indexing): restore type narrowing and memoize shape on IndexDomain
d-v-b Jun 29, 2026
cd217c7
docs: update Array repr doctests for domain= suffix
d-v-b Jun 29, 2026
22db497
Merge branch 'main' into refactor/simplify-indexing
d-v-b Jun 29, 2026
a2091a3
test(lazy-indexing): dense parametrized matrix over 0-D / sharded / u…
d-v-b Jun 30, 2026
24aaaaf
test(lazy-indexing): model-based randomized round-trips over sharded/…
d-v-b Jun 30, 2026
c51ef58
fix(lazy): correct orthogonal (oindex) indexing across multiple array…
d-v-b Jun 30, 2026
6f1a829
fix(indexing): honor the transform for fancy/field-less selections on…
d-v-b Jun 30, 2026
d38205c
test(lazy-indexing): hypothesis property test for the lazy-view surface
d-v-b Jun 30, 2026
d9aad9f
test(indexing): comprehensive cross-path indexing parity harness
d-v-b Jun 30, 2026
3734357
test(strategies): reusable indexing-selection fixtures (indexers, win…
d-v-b Jun 30, 2026
889787a
fix(indexing): reject block selection on a lazy view instead of corru…
d-v-b Jun 30, 2026
2027a77
test(indexing): make the parity harness consumer-agnostic (eager-merg…
d-v-b Jun 30, 2026
8dc45e2
test(indexing): fold per-mode eager tests into the single parity oracle
d-v-b Jun 30, 2026
ab2e352
refactor(indexing): address review nits (comments, dead code, test co…
d-v-b Jun 30, 2026
57f3c00
fix(indexing): bounds-check lazy oindex/vindex array values
d-v-b Jun 30, 2026
ebf9f8f
test: type mode params and indexers return; consistent mode collections
d-v-b Jun 30, 2026
8aee921
test(indexing): split read/write parity; xfail the one known-unsuppor…
d-v-b Jun 30, 2026
734febd
test(indexing): filter writes whose targets collide after negative-in…
d-v-b Jun 30, 2026
dce5d46
test(indexing): stateful model-lockstep harness + bounds error-parity…
d-v-b Jun 30, 2026
93fb2d4
test(indexing): rename indexers -> numpy_array_indexers and clarify s…
d-v-b Jun 30, 2026
8153d48
docs: use single backticks in indexing test-infra docstrings
d-v-b Jun 30, 2026
04dab0a
docs: clarify why repeated-target writes are rejected in _write_is_un…
d-v-b Jun 30, 2026
a192ca4
test(indexing): use the Expect helper for test_write_is_unambiguous c…
d-v-b Jun 30, 2026
34d3b1b
feat(array): guard grid-describing members on lazy views (LazyViewError)
d-v-b Jul 1, 2026
16e96dc
feat(array): add view-aware chunk_projections partition API (Layer B)
d-v-b Jul 1, 2026
9b8da76
Merge branch 'main' into refactor/simplify-indexing
d-v-b Jul 1, 2026
8f51ead
test(indexing): fix the write filter to inspect the raw zarr selectio…
d-v-b Jul 1, 2026
97b0947
fix(lazy): make negative slice bounds literal — one consistent coordi…
d-v-b Jul 2, 2026
02a5c67
test(lazy): type the bounds-error trigger list for mypy
d-v-b Jul 2, 2026
767df8b
docs: add a lazy-indexing user guide (theory + executed patterns)
d-v-b Jul 2, 2026
76799cc
fix(transforms): slice bounds are literal coordinates at any domain o…
d-v-b Jul 2, 2026
e70056d
feat(lazy)!: TensorStore-parity domain semantics — preservation, one …
d-v-b Jul 2, 2026
816810d
docs(lazy): rewrite the lazy-indexing guide around TensorStore domain…
d-v-b Jul 2, 2026
ba97603
test(lazy): pin mask True-positions as absolute coordinates (roborev …
d-v-b Jul 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
294 changes: 294 additions & 0 deletions docs/user-guide/lazy_indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
# Lazy indexing

Zarr arrays support *lazy* indexing through the `Array.lazy` accessor. Where
ordinary indexing reads or writes data immediately, `a.lazy[selection]` returns a
lightweight **view** — itself a `zarr.Array` — without touching storage. Views
compose, support orthogonal and coordinate selection, write through to the
backing array, and materialize on demand.

Zarr's lazy indexing follows [TensorStore's indexing
model](https://google.github.io/tensorstore/python/indexing.html): a view has a
**domain** — a box of coordinates — and every index is a **literal coordinate**
in that domain.

```python exec="true" session="lazy"
import numpy as np
import zarr
```

```python exec="true" session="lazy" source="above" result="ansi"
a = zarr.create_array(store="memory://lazy-demo", shape=(12,), chunks=(3,), dtype="int32")
a[...] = np.arange(12)

view = a.lazy[2:10] # no I/O happens here
print(view)
print(view.result()) # I/O happens here
```

## Theory: indexing as a declaration

Eager indexing is an *action*: `a[2:10]` performs I/O now and hands back the
bytes. Lazy indexing is a *declaration*: `a.lazy[2:10]` records **which cells
you mean** — an index transform mapping the view's coordinates to storage
coordinates — and defers the I/O. Because the selection is data rather than an
action, zarr can:

- **compose** it with further selections without reading anything,
- **write through** it (the same transform routes values back to storage),
- **plan** with it (`chunk_projections` enumerates exactly the stored chunks
the declaration touches).

### Domains are preserved: an index is a name, not a position

A view keeps the coordinates of the cells it selects. Slicing `[2:10]` does not
renumber anything — the view's domain *is* `[2, 10)`, and coordinate 3 still
means what it meant on the parent:

```python exec="true" session="lazy" source="above" result="ansi"
v = a.lazy[2:10]
print(v.shape) # (8,) — eight cells...
print(v.lazy[3].result()) # ...and coordinate 3 is still base cell 3
print(v.lazy[3:7].result()) # coordinates [3, 7) — literal, stable
```

This is what makes composition safe: a coordinate means the same cell no matter
how many views deep you are. `a.lazy[2:10].lazy[3:7]` and `a.lazy[3:7]` are the
same selection.

The price of stable names: **positions are not valid indices.** The first
element of `v` is coordinate 2, not 0 — and `v[0]` is an error, not the first
element:

```python exec="true" session="lazy" source="above" result="ansi"
try:
v[0]
except IndexError as e:
print(e)
```

To renumber a view explicitly, move its domain with `translate_to` (or shift it
with `translate_by`) — the data does not move, only the labels:

```python exec="true" session="lazy" source="above" result="ansi"
z = v.translate_to((0,)) # same cells, coordinates now [0, 8)
print(z.lazy[0].result()) # coordinate 0 -> base cell 2
w = a.translate_by((-10,)) # a view of `a` labeled [-10, 2)
print(w.lazy[-1].result()) # -1 is just another index: base cell 9
```

### `-1` is just another index

Because indices are literal coordinates, a negative index is *not* "from the
end" — it names the coordinate `-1`, which your domain may or may not contain.
On the translated view above, `-1` was perfectly valid. On a fresh array (domain
`[0, n)`), it is out of bounds — in **every** syntactic form: integers, slice
bounds, and index arrays are treated identically.

```python exec="true" session="lazy" source="above" result="ansi"
for make in (lambda: a.lazy[-1], lambda: a.lazy[-3:], lambda: a.lazy.oindex[[-1]]):
try:
make()
except IndexError as e:
print(type(e).__name__, "-", str(e).split(";")[0])
```

To select from the end, say what you mean literally:

```python exec="true" session="lazy" source="above" result="ansi"
n = a.shape[0]
print(a.lazy[n - 3 :].result()) # the last three elements
```

### No clamping: intervals must fit the domain

A slice interval must be contained in the domain — an out-of-range bound is an
error, not a silently shorter result. Empty intervals are the one exception:
they are valid anywhere. Reversed bounds are an error, not an empty result.

```python exec="true" session="lazy" source="above" result="ansi"
for sel in (slice(5, 100), slice(5, 2)):
try:
a.lazy[sel]
except IndexError as e:
print(type(e).__name__, "-", str(e).split(";")[0])
print(a.lazy[5:5].shape) # empty is fine, anywhere
```

### Strided views renumber by division

A strided slice produces a domain in *strided units*: for step `k`, the new
origin is `start / k` rounded toward zero, and coordinate `origin + i` maps to
base cell `start + i*k` (TensorStore's rule):

```python exec="true" session="lazy" source="above" result="ansi"
s = a.lazy[1:10:3] # base cells 1, 4, 7
print(s) # domain [0, 3)
print(s.lazy[1].result()) # coordinate 1 -> base cell 4
```

### One coordinate system per view

Every way of indexing a view — `v[...]`, `v.lazy[...]`, `v.oindex`, `v.vindex`
— uses the same domain coordinates. (The base array's ordinary `a[...]` API is
unchanged: it keeps full NumPy semantics, negatives and all. The literal rules
apply to *views* and to the `.lazy` accessor.) NumPy-style zero-based access to
a view's data is spelled explicitly: materialize with `result()` /
`np.asarray`, or renumber with `translate_to`.

```python exec="true" session="lazy" source="above" result="ansi"
print(v[3], v.lazy[3].result()) # same coordinate, same cell
print(np.asarray(v)[0]) # materialized: NumPy rules apply
print(a[-1]) # base arrays keep NumPy semantics
```

## Common patterns

### Crop, analyze, crop again

```python exec="true" session="lazy" source="above" result="ansi"
img = zarr.create_array(store="memory://lazy-img", shape=(100, 100), chunks=(10, 10), dtype="float64")
img[...] = np.arange(100 * 100).reshape(100, 100)

crop = img.lazy[25:75, 25:75] # no I/O; domain [25,75) x [25,75)
inner = crop.lazy[35:65, 35:65] # coordinates are literal: this is img[35:65, 35:65]
print(crop.shape, inner.shape)
print(float(np.mean(inner))) # I/O happens here, for the inner crop only
```

### Write through a view

Assignment through the accessor, or through a view, routes values back to
storage — including strided and composed selections:

```python exec="true" session="lazy" source="above" result="ansi"
img.lazy[30:50, 40:60] = 0.0 # region write
tile = img.lazy[30:50, 40:60]
tile[30:35, 40:45] = 7.0 # write through the view, same coordinates
img.lazy[::2, ::2] = -1.0 # strided write, NumPy-equivalent cells
print(img[29:33, 39:43])
```

### Orthogonal and coordinate selection

`lazy.oindex` selects an outer product per axis; `lazy.vindex` selects points.
Index-array *values* are domain coordinates; the dimension a fancy selection
*creates* gets a fresh `[0, n)` domain (there is no meaningful coordinate to
preserve for "the i-th pick"):

```python exec="true" session="lazy" source="above" result="ansi"
rows = img.lazy.oindex[[3, 17, 42], :] # picked dim: domain [0, 3); row dim preserved
sub = rows.lazy[:, 10:20] # column window, literal coords
print(rows.shape, sub.shape)

pts = img.lazy.vindex[[3, 17], [5, 9]] # two points -> fresh domain [0, 2)
print(pts.result())
```

Boolean masks are array selections, so they go through `oindex`/`vindex`; the
positions of `True` values become **coordinates, counted from 0** — not offsets
from the view's origin. On a view whose domain starts at 2, a mask `True` at
position 3 addresses coordinate 3, and a `True` at position 0 or 1 is out of
the domain (matching TensorStore, where a mask is sugar for the coordinate
array of its `True` positions):

```python exec="true" session="lazy" source="above" result="ansi"
mask = np.zeros(12, dtype=bool)
mask[[1, 4, 6]] = True
print(a.lazy.oindex[mask].result())
```

### Materializing

`view.result()`, `view[...]`, and `np.asarray(view)` are equivalent whole-view
reads; views also work directly with NumPy reductions. Views are **not**
iterable (iterate the materialized result instead):

```python exec="true" session="lazy" source="above" result="ansi"
w2 = a.lazy[3:9]
print(w2.result(), float(np.mean(w2)))
try:
iter(w2)
except TypeError as e:
print(e)
```

### Chunk-aware processing

`chunk_projections` enumerates the stored chunks a view touches: which store
object (`coord`, `key`), its stored `shape`, the region *within the chunk* the
view covers (`chunk_selection`), the region *of the view* it maps to
(`array_selection`, positional — 0-based into the view's extent), and whether
the chunk is only partially covered (`is_partial` — a partial write requires a
read-modify-write):

```python exec="true" session="lazy" source="above" result="ansi"
for p in a.lazy[2:10].chunk_projections():
print(p.key, p.chunk_selection, p.array_selection, p.is_partial)
print(a.lazy[2:10].is_chunk_aligned())
print(a.lazy[3:9].is_chunk_aligned()) # starts and ends on chunk boundaries
```

This is the supported way to partition any selection for parallel or
chunk-at-a-time work — compose the selection through `.lazy`, then project.
Since `array_selection` is positional, re-zero the view (or materialize) to use
it:

```python exec="true" session="lazy" source="above" result="ansi"
crop0 = img.lazy[25:75, 25:75].translate_to((0, 0))
total = 0.0
for p in crop0.chunk_projections():
total += float(np.sum(crop0[tuple(slice(s.start, s.stop) for s in p.array_selection)]))
print(total == float(np.sum(crop0)))
```

For sharded arrays, pass `unit="write"` to enumerate at shard (write-unit)
granularity; read-unit projections for sharded arrays are not yet implemented.

### What a view will not tell you

Members that describe the chunk grid assume the array *fills* its grid, which a
view generally does not. On a view they raise `zarr.errors.LazyViewError`
instead of silently describing the backing array:

```python exec="true" session="lazy" source="above" result="ansi"
try:
v.chunks
except zarr.errors.LazyViewError as e:
print(e)
```

Logical members (`shape`, `size`, `nbytes`, `dtype`, `attrs`, ...) reflect the
view; `metadata` and `chunk_grid` remain available and describe the *backing*
array.

## Coming from NumPy

- **A view's indices are coordinates, not positions.** `a.lazy[2:10]` is
indexed with 2..9, not 0..7. Renumber explicitly with
`view.translate_to((0, ...))` if you want positions.
- **Negative indices are not from-the-end** — in any form (integer, slice
bound, index array). They name literal coordinates, which fresh arrays'
domains (starting at 0) do not contain. Use `shape[dim] - k`, or translate
the domain so negative coordinates exist.
- **No clamping**: out-of-range slice bounds raise; reversed bounds raise;
only empty intervals are allowed anywhere.
- **No negative steps**: `a.lazy[::-1]` raises; reversal is not yet supported.
- **No `newaxis`**: `a.lazy[None]` raises; insert axes on the materialized
result instead.
- **The basic accessor takes basic selections only** (integers, slices,
ellipsis). Lists, arrays, and boolean masks go through `lazy.oindex` /
`lazy.vindex`.
- **Views are not iterable**; iterate `view.result()`.
- **Base arrays are unchanged**: `a[-1]`, `a[5:100]`, and friends keep full
NumPy semantics on non-view arrays.

## Current limitations

- Negative slice steps (reversal) are not yet supported.
- Integer indexing a dimension *created by* an `oindex`/`vindex` selection
(e.g. `rows.lazy[0]` after `rows = a.lazy.oindex[[3, 17, 42], :]`) is not yet
supported reliably; slice the view instead (`rows.lazy[0:1]`).
- `chunk_projections(unit="read")` on sharded arrays (inner-chunk granularity)
is not yet implemented; use `unit="write"`.
- Views cannot be resized or appended to, and block selection is not defined
for views.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ nav:
- user-guide/index.md
- user-guide/installation.md
- user-guide/arrays.md
- user-guide/lazy_indexing.md
- user-guide/groups.md
- user-guide/attributes.md
- user-guide/storage.md
Expand Down
2 changes: 2 additions & 0 deletions src/zarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
zeros_like,
)
from zarr.core.array import Array, AsyncArray
from zarr.core.chunk_partition import ChunkProjection
from zarr.core.config import config
from zarr.core.group import AsyncGroup, Group

Expand Down Expand Up @@ -146,6 +147,7 @@ def set_format(log_format: str) -> None:
"Array",
"AsyncArray",
"AsyncGroup",
"ChunkProjection",
"Group",
"__version__",
"array",
Expand Down
8 changes: 4 additions & 4 deletions src/zarr/api/synchronous.py
Original file line number Diff line number Diff line change
Expand Up @@ -1138,20 +1138,20 @@ def from_array(
... )
>>> arr2 = zarr.from_array(store, data=arr, overwrite=True)
>>> arr2
<Array file://example_from_array.zarr shape=(100, 100) dtype=int32>
<Array file://example_from_array.zarr shape=(100, 100) dtype=int32 domain={ [0, 100), [0, 100) }>
>>> asyncio.run(store.clear()) # Remove files generated by test

Create an array from an existing NumPy array:

>>> import numpy as np
>>> zarr.from_array({}, data=np.arange(10000, dtype="i4").reshape(100, 100))
<Array memory://... shape=(100, 100) dtype=int32>
<Array memory://... shape=(100, 100) dtype=int32 domain={ [0, 100), [0, 100) }>

Create an array from any array-like object:

>>> arr3 = zarr.from_array({}, data=[[1, 2], [3, 4]])
>>> arr3
<Array memory://... shape=(2, 2) dtype=int64>
<Array memory://... shape=(2, 2) dtype=int64 domain={ [0, 2), [0, 2) }>
>>> arr3[...]
array([[1, 2], [3, 4]])

Expand All @@ -1160,7 +1160,7 @@ def from_array(
>>> arr4 = zarr.from_array({}, data=[[1, 2], [3, 4]])
>>> arr5 = zarr.from_array({}, data=arr4, write_data=False)
>>> arr5
<Array memory://... shape=(2, 2) dtype=int64>
<Array memory://... shape=(2, 2) dtype=int64 domain={ [0, 2), [0, 2) }>
>>> arr5[...]
array([[0, 0], [0, 0]])
"""
Expand Down
Loading
Loading