test: comprehensive eager indexing parity oracle + reusable indexing strategies by d-v-b · Pull Request #4109 · zarr-developers/zarr-python

d-v-b · 2026-06-30T13:55:06Z

🤖 AI text below 🤖

Standalone test-infrastructure improvement extracted from the lazy-indexing work (#3906). It hardens the existing (eager) indexing API and has no dependency on the lazy IndexTransform work, so it can land independently. No lazy-indexing tests are included here — those live with the lazy PR; this PR only sets up the harness so that work can reuse it.

What this adds

Reusable Hypothesis strategy (zarr.testing.strategies)

numpy_array_indexers(mode, shape) → a (zarr_selection, numpy_selection) pair for any of basic / oindex / vindex / mask — the element-space modes that have a direct NumPy-array oracle — so an indexing test is written once and parametrized over mode. (Block indexing is deliberately out of scope: it addresses the chunk grid, not array elements, and has no NumPy equivalent; it keeps its own block_indices strategy.)

NumPy-oracle parity tests (tests/test_properties.py), folding the old per-mode tests into a small set:

test_indexing_read_parity — every mode, for sync read, read into an out= buffer, and async read (adds out= coverage the per-mode tests lacked). The read oracle is dtype-strict (asserts dtype, not just values).
test_indexing_write_parity — every mode; ill-defined writes (a cell hit more than once after negative-index normalization) are filtered via _write_is_unambiguous, and the known sharded multi-axis-oindex limitation is an xfail, not a silent skip.
test_indexing_bounds_error_parity — out-of-bounds integer selections must raise in zarr iff they raise in NumPy (zarr's bounds errors subclass IndexError). Catches the silent-wrong-data class — returning garbage where NumPy would raise.
test_write_is_unambiguous — regression anchor for the duplicate-target filter.

Stateful model-lockstep harness (slow-gated; runs in the Slow Hypothesis job):

_IndexingStateMachine applies many random reads/writes to one array while a NumPy model tracks it in lockstep, asserting equality after every step — catching read-modify-write / chunk-boundary / shard-merge bugs the single-shot tests can't see. Runs over regular, sharded, and 3-D geometries.

These borrow the strongest ideas from TensorStore's driver_testutil (model lockstep) and ndindex's check_same (error parity + dtype-strict comparison).

Forward-compatibility (no lazy dependency)

The read oracle is consumer-agnostic — assert_read_matches_numpy(target, ...) where target is any zarr.Array — so once lazy indexing lands, the lazy PR can pass a lazy view through this exact harness instead of duplicating it.

Verified on main: full tests/test_properties.py green (16 passed, 3 skipped [the slow state machines], 1 xfailed); the Slow Hypothesis job exercises the state machines; ruff/mypy clean.

🤖 Generated with Claude Code

…strategies Extracted from the lazy-indexing work (PR zarr-developers#3906) as a standalone, eager-only improvement that hardens the *existing* indexing API and stands on its own. - zarr.testing.strategies gains two reusable, documented strategies: - indexers(mode, shape): one strategy returning a (zarr_selection, numpy_selection) pair for any of basic / oindex / vindex / mask, so indexing tests are written once and parametrized over mode. - windows(shape): non-negative, full-rank slice windows for sub-region tests. - test_properties.py folds the four per-mode indexing tests into a single NumPy-oracle test, test_indexing_parity, covering {basic, oindex, vindex, mask} for sync read, read into an out= buffer, async read, and write (with the historical per-mode write guards preserved). This adds out=-buffer coverage the per-mode tests lacked. - A consumer-agnostic helper (assert_read_matches_numpy) and a test_lazy_view_indexing_parity test are included but the latter is skipped unless Array.lazy exists, so the lazy-indexing feature can later reuse the same oracle without re-deriving any setup. Assisted-by: ClaudeCode:claude-opus-4.8

codecov · 2026-06-30T14:02:41Z

Codecov Report

❌ Patch coverage is 62.50000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.47%. Comparing base (1ab9953) to head (230bdff).

Files with missing lines	Patch %	Lines
src/zarr/testing/strategies.py	62.50%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4109      +/-   ##
==========================================
- Coverage   93.50%   93.47%   -0.04%     
==========================================
  Files          90       90              
  Lines       11981    11997      +16     
==========================================
+ Hits        11203    11214      +11     
- Misses        778      783       +5

Files with missing lines	Coverage Δ
src/zarr/testing/strategies.py	`93.85% <62.50%> (-1.54%)`	⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…nt mode collections - Type the `mode` parameter as the `IndexMode` literal (not `str`) across the test helpers (`_get`, `_async_get`, `_setitem`, `_eligible`, `assert_read_matches_numpy`). - Make `_VECTORIZED_MODES` a tuple like `_INDEX_MODES` (was a `frozenset`) for consistency; `_INDEX_MODES` must stay an ordered sequence for `sampled_from`. - `indexers(...)` now returns `tuple[Selection, Selection]` (the real `zarr.core.indexing.Selection` type) instead of `tuple[Any, Any]`. Assisted-by: ClaudeCode:claude-opus-4.8

dcherian · 2026-06-30T15:02:45Z

+    # write, mirroring the historical per-mode guards:
+    #   - vindex setitem is chunk-dependent for repeated coords (skipped)
+    #   - oindex/mask writes to sharded arrays are unsupported (GH2834)
+    #   - oindex writes with repeated indices are unspecified
+    if mode == "vindex":
+        return
+    if mode in ("oindex", "mask") and zarray.shards is not None:
+        return
+    if mode == "oindex" and _has_repeated_indices(npsel):
+        return


this is why i didn't do this haha.

From a quick glance I can't tell what makes it through.

yeah i hate this! i'm looking for something less dirty

…ted write Replace the brittle silent-`return` write guards with a principled split: - test_indexing_read_parity: reads (sync/out=/async), all modes, no guards. - test_indexing_write_parity: writes, filtering with `assume` only the cases with no well-defined oracle — repeated coordinates (order-dependent; a fundamental limitation shared with NumPy, not a bug). Now exercises vindex writes, sharded single-axis oindex/mask writes, and unsharded multi-axis oindex writes that the old blanket `assume(shards is None)` skipped. - test_oindex_sharded_multiaxis_write_xfail: pins the one genuinely-unsupported write (orthogonal across >=2 array axes on a sharded array, GH2834) as a strict xfail, visible at the pytest level. Assisted-by: ClaudeCode:claude-opus-4.8

…dex normalization CI slow-hypothesis falsified test_indexing_write_parity with a vindex write [2, -2, 0, 1] on length 4 (-2 -> 2, so cell 2 written twice — undefined, order-dependent). The old filter checked raw values and missed the post- normalization collision. Replace _has_repeated_indices with mode-aware _write_is_unambiguous(mode, npsel, shape) that normalizes negatives first (per-axis for oindex, coordinate-tuples for vindex). Test-oracle gap, not a bug. Assisted-by: ClaudeCode:claude-opus-4.8

… + dtype-strict oracle Borrow the strongest ideas from TensorStore's and ndindex's indexing test suites: - Stateful harness (TensorStore driver_testutil pattern, NumPy oracle): _IndexingStateMachine applies many random reads/writes (basic/oindex/vindex/mask) to one array while a NumPy model tracks it in lockstep, asserting equality after every step. Catches read-modify-write, chunk-boundary, and shard-merge bugs the single-shot parity tests cannot. Runs over regular/sharded/3D geometries, gated behind --run-slow-hypothesis like the store stateful tests. - Error parity (ndindex check_same): test_indexing_bounds_error_parity generates possibly-out-of-bounds integer selections and asserts zarr raises iff NumPy raises (zarr's bounds errors subclass IndexError). Catches the silent-wrong-data class — returning garbage where NumPy would raise. - Strict read oracle: assert dtype, not just values/shape, for ndim>=1 results. - Regression anchor: test_write_is_unambiguous pins the negative-index collision fix ([2,-2,0,1] -> cell 2 written twice). Assisted-by: ClaudeCode:claude-opus-4.8

…cope The strategy covers only the element-space indexing modes that have a direct NumPy-array oracle (basic/oindex/vindex/mask). Rename it so the name states that, and document that block indexing is deliberately excluded — it addresses the chunk grid (parametrized by the grid, not shape) and has no NumPy equivalent, so it lives in the separate block_indices strategy. Assisted-by: ClaudeCode:claude-opus-4.8

mkdocs renders single backticks for inline code; the docstrings used RST-style double backticks. Normalize to single backticks across the indexing strategies and property tests. Assisted-by: ClaudeCode:claude-opus-4.8

…ambiguous NumPy's deterministic last-write-wins is an implementation detail of serial, C-order index iteration, not a guarantee. zarr writes scalars into chunks and cannot in general promise a write order, so it cannot be expected to match NumPy's choice — hence no well-defined oracle for duplicate-target writes. Assisted-by: ClaudeCode:claude-opus-4.8

…ases Replace raw pytest.param tuples with the conftest Expect dataclass (input / output / id), matching the established idiom in test_chunk_grids etc. Assisted-by: ClaudeCode:claude-opus-4.8

test_lazy_view_indexing_parity is skipif(not hasattr(Array, "lazy")), so it never runs in this eager-only PR — dead weight that falsely reads as coverage. It (and the windows strategy, whose only consumer it was) belong with the lazy indexing work, not here. The consumer-agnostic assert_read_matches_numpy harness stays, so the lazy PR can re-add the view test on top. Assisted-by: ClaudeCode:claude-opus-4.8

dcherian

to be honest, I find this extremely hard to read with all the extra indirection. The duplication in the previous version made it far more readable. And given that this is all agent-generated, why not just duplicated and optimize for readability?

dcherian · 2026-07-01T15:16:05Z

+        self.zarray = zarr.create_array(
+            {}, shape=self.shape, chunks=self.chunks, shards=self.shards, dtype="i4"
+        )
+        self.zarray[...] = self.model


Why not use the generator here? and set self.model = self.array[:]

do you mean get our array values from a generator? I'm not sure that we want this for checking sequences of array indexing operations, since our assumption here is that indexing is insensitive to the scalar values. a deterministic sequence seems like a better set of initial values?

d-v-b · 2026-07-01T15:23:27Z

to be honest, I find this extremely hard to read with all the extra indirection. The duplication in the previous version made it far more readable. And given that this is all agent-generated, why not just duplicated and optimize for readability?

noted! the reason for the indirection is to facilitate testing of lazy indexing, which supports the numpy-style indexing semantics, and so requires the exact same set of tests. Without this indirection, I worried that we would have to write _lazy versions of every indexing method test, and this invites mistakes.

d-v-b · 2026-07-01T15:33:35Z

per @dcherian's recommendation, I'm going to go back to duplicated per-mode tests. We can deal with the cost of duplication (if any) when we start wiring up lazy indexing.

@dcherian

…view feedback) Address @dcherian's review on zarr-developers#4109: the single mode-dispatched oracle (test_indexing_read_parity / _write_parity going through _get / _setitem / assert_read_matches_numpy) was hard to read, and the stacked `assume()` write filters made it impossible to tell which selections were actually exercised. - Replace the two folded tests with explicit per-mode read/write tests (test_basic_read, test_oindex_read, ... test_mask_write). Each is self-contained, calls the real accessor directly, and still covers sync + async + out= reads. - Writes are now CONSTRUCTIVE instead of generate-then-`assume`-reject: oindex draws unique per-axis indices, vindex draws distinct coordinates via unravel_index, so every write is well-defined by construction (nothing hidden behind assume). - Drop the now-unused assert_read_matches_numpy / _async_get / _VECTORIZED_MODES. The mode-dispatch helpers that remain are used only by the stateful machine (which is inherently random-mode); noted as such. - State machine: set model from the array (`self.model = self.zarray[:]`) per review. The consumer-agnostic shared harness stays on the lazy PR (zarr-developers#3906), where a lazy view actually reuses it — introducing that abstraction at the point of second use. Assisted-by: ClaudeCode:claude-opus-4.8

…n (roborev zarr-developers#341) Same fix as on the lazy branch, applied to the state machine here (the per-mode write parity tests are already constructive). _write_is_unambiguous / _n_array_axes were fed the broadcast np.ix_-style npsel instead of the raw per-axis zsel, so for oindex the per-axis uniqueness check tripped on tiling and the state machine's write rule skipped every multi-axis orthogonal write. Pass zsel; clarify the helper docstrings; add test_write_filter_rejects_broadcast_oindex pinning the invariant. Assisted-by: ClaudeCode:claude-opus-4.8

github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 30, 2026

d-v-b force-pushed the lazy-indexing-pr2-indexing-tests branch from 2ac5f75 to b75c702 Compare June 30, 2026 14:16

dcherian reviewed Jun 30, 2026

View reviewed changes

d-v-b added 2 commits June 30, 2026 17:15

maxrjones reviewed Jun 30, 2026

View reviewed changes

Comment thread src/zarr/testing/strategies.py Outdated

d-v-b added 6 commits June 30, 2026 21:05

docs: use single backticks in indexing test-infra docstrings

4a3c62f

mkdocs renders single backticks for inline code; the docstrings used RST-style double backticks. Normalize to single backticks across the indexing strategies and property tests. Assisted-by: ClaudeCode:claude-opus-4.8

test(indexing): use the Expect helper for test_write_is_unambiguous c…

e000228

…ases Replace raw pytest.param tuples with the conftest Expect dataclass (input / output / id), matching the established idiom in test_chunk_grids etc. Assisted-by: ClaudeCode:claude-opus-4.8

d-v-b marked this pull request as ready for review June 30, 2026 20:08

d-v-b requested a review from dcherian July 1, 2026 13:19

Merge branch 'main' into lazy-indexing-pr2-indexing-tests

65eeb6e

dcherian reviewed Jul 1, 2026

View reviewed changes

d-v-b added 2 commits July 1, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

test: comprehensive eager indexing parity oracle + reusable indexing strategies#4109

test: comprehensive eager indexing parity oracle + reusable indexing strategies#4109
d-v-b wants to merge 13 commits into
zarr-developers:mainfrom
d-v-b:lazy-indexing-pr2-indexing-tests

d-v-b commented Jun 30, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

dcherian Jun 30, 2026 •

edited

Loading

Uh oh!

d-v-b Jun 30, 2026

Uh oh!

Uh oh!

dcherian left a comment

Uh oh!

dcherian Jul 1, 2026

Uh oh!

d-v-b Jul 1, 2026

Uh oh!

d-v-b commented Jul 1, 2026

Uh oh!

d-v-b commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

d-v-b commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds

Forward-compatibility (no lazy dependency)

Uh oh!

codecov Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dcherian Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcherian left a comment

Choose a reason for hiding this comment

Uh oh!

dcherian Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

d-v-b Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Jul 1, 2026

Uh oh!

d-v-b commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

d-v-b commented Jun 30, 2026 •

edited

Loading

codecov Bot commented Jun 30, 2026 •

edited

Loading

dcherian Jun 30, 2026 •

edited

Loading