test: comprehensive eager indexing parity oracle + reusable indexing strategies#4109
test: comprehensive eager indexing parity oracle + reusable indexing strategies#4109d-v-b wants to merge 13 commits into
Conversation
…strategies Extracted from the lazy-indexing work (PR zarr-developers#3906) as a standalone, eager-only improvement that hardens the *existing* indexing API and stands on its own. - zarr.testing.strategies gains two reusable, documented strategies: - indexers(mode, shape): one strategy returning a (zarr_selection, numpy_selection) pair for any of basic / oindex / vindex / mask, so indexing tests are written once and parametrized over mode. - windows(shape): non-negative, full-rank slice windows for sub-region tests. - test_properties.py folds the four per-mode indexing tests into a single NumPy-oracle test, test_indexing_parity, covering {basic, oindex, vindex, mask} for sync read, read into an out= buffer, async read, and write (with the historical per-mode write guards preserved). This adds out=-buffer coverage the per-mode tests lacked. - A consumer-agnostic helper (assert_read_matches_numpy) and a test_lazy_view_indexing_parity test are included but the latter is skipped unless Array.lazy exists, so the lazy-indexing feature can later reuse the same oracle without re-deriving any setup. Assisted-by: ClaudeCode:claude-opus-4.8
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4109 +/- ##
==========================================
- Coverage 93.50% 93.47% -0.04%
==========================================
Files 90 90
Lines 11981 11997 +16
==========================================
+ Hits 11203 11214 +11
- Misses 778 783 +5
🚀 New features to boost your workflow:
|
…nt mode collections - Type the `mode` parameter as the `IndexMode` literal (not `str`) across the test helpers (`_get`, `_async_get`, `_setitem`, `_eligible`, `assert_read_matches_numpy`). - Make `_VECTORIZED_MODES` a tuple like `_INDEX_MODES` (was a `frozenset`) for consistency; `_INDEX_MODES` must stay an ordered sequence for `sampled_from`. - `indexers(...)` now returns `tuple[Selection, Selection]` (the real `zarr.core.indexing.Selection` type) instead of `tuple[Any, Any]`. Assisted-by: ClaudeCode:claude-opus-4.8
2ac5f75 to
b75c702
Compare
| # write, mirroring the historical per-mode guards: | ||
| # - vindex setitem is chunk-dependent for repeated coords (skipped) | ||
| # - oindex/mask writes to sharded arrays are unsupported (GH2834) | ||
| # - oindex writes with repeated indices are unspecified | ||
| if mode == "vindex": | ||
| return | ||
| if mode in ("oindex", "mask") and zarray.shards is not None: | ||
| return | ||
| if mode == "oindex" and _has_repeated_indices(npsel): | ||
| return |
There was a problem hiding this comment.
this is why i didn't do this haha.
From a quick glance I can't tell what makes it through.
There was a problem hiding this comment.
yeah i hate this! i'm looking for something less dirty
…ted write Replace the brittle silent-`return` write guards with a principled split: - test_indexing_read_parity: reads (sync/out=/async), all modes, no guards. - test_indexing_write_parity: writes, filtering with `assume` only the cases with no well-defined oracle — repeated coordinates (order-dependent; a fundamental limitation shared with NumPy, not a bug). Now exercises vindex writes, sharded single-axis oindex/mask writes, and unsharded multi-axis oindex writes that the old blanket `assume(shards is None)` skipped. - test_oindex_sharded_multiaxis_write_xfail: pins the one genuinely-unsupported write (orthogonal across >=2 array axes on a sharded array, GH2834) as a strict xfail, visible at the pytest level. Assisted-by: ClaudeCode:claude-opus-4.8
…dex normalization CI slow-hypothesis falsified test_indexing_write_parity with a vindex write [2, -2, 0, 1] on length 4 (-2 -> 2, so cell 2 written twice — undefined, order-dependent). The old filter checked raw values and missed the post- normalization collision. Replace _has_repeated_indices with mode-aware _write_is_unambiguous(mode, npsel, shape) that normalizes negatives first (per-axis for oindex, coordinate-tuples for vindex). Test-oracle gap, not a bug. Assisted-by: ClaudeCode:claude-opus-4.8
… + dtype-strict oracle Borrow the strongest ideas from TensorStore's and ndindex's indexing test suites: - Stateful harness (TensorStore driver_testutil pattern, NumPy oracle): _IndexingStateMachine applies many random reads/writes (basic/oindex/vindex/mask) to one array while a NumPy model tracks it in lockstep, asserting equality after every step. Catches read-modify-write, chunk-boundary, and shard-merge bugs the single-shot parity tests cannot. Runs over regular/sharded/3D geometries, gated behind --run-slow-hypothesis like the store stateful tests. - Error parity (ndindex check_same): test_indexing_bounds_error_parity generates possibly-out-of-bounds integer selections and asserts zarr raises iff NumPy raises (zarr's bounds errors subclass IndexError). Catches the silent-wrong-data class — returning garbage where NumPy would raise. - Strict read oracle: assert dtype, not just values/shape, for ndim>=1 results. - Regression anchor: test_write_is_unambiguous pins the negative-index collision fix ([2,-2,0,1] -> cell 2 written twice). Assisted-by: ClaudeCode:claude-opus-4.8
…cope The strategy covers only the element-space indexing modes that have a direct NumPy-array oracle (basic/oindex/vindex/mask). Rename it so the name states that, and document that block indexing is deliberately excluded — it addresses the chunk grid (parametrized by the grid, not shape) and has no NumPy equivalent, so it lives in the separate block_indices strategy. Assisted-by: ClaudeCode:claude-opus-4.8
mkdocs renders single backticks for inline code; the docstrings used RST-style double backticks. Normalize to single backticks across the indexing strategies and property tests. Assisted-by: ClaudeCode:claude-opus-4.8
…ambiguous NumPy's deterministic last-write-wins is an implementation detail of serial, C-order index iteration, not a guarantee. zarr writes scalars into chunks and cannot in general promise a write order, so it cannot be expected to match NumPy's choice — hence no well-defined oracle for duplicate-target writes. Assisted-by: ClaudeCode:claude-opus-4.8
…ases Replace raw pytest.param tuples with the conftest Expect dataclass (input / output / id), matching the established idiom in test_chunk_grids etc. Assisted-by: ClaudeCode:claude-opus-4.8
test_lazy_view_indexing_parity is skipif(not hasattr(Array, "lazy")), so it never runs in this eager-only PR — dead weight that falsely reads as coverage. It (and the windows strategy, whose only consumer it was) belong with the lazy indexing work, not here. The consumer-agnostic assert_read_matches_numpy harness stays, so the lazy PR can re-add the view test on top. Assisted-by: ClaudeCode:claude-opus-4.8
dcherian
left a comment
There was a problem hiding this comment.
to be honest, I find this extremely hard to read with all the extra indirection. The duplication in the previous version made it far more readable. And given that this is all agent-generated, why not just duplicated and optimize for readability?
| self.zarray = zarr.create_array( | ||
| {}, shape=self.shape, chunks=self.chunks, shards=self.shards, dtype="i4" | ||
| ) | ||
| self.zarray[...] = self.model |
There was a problem hiding this comment.
Why not use the generator here? and set self.model = self.array[:]
There was a problem hiding this comment.
do you mean get our array values from a generator? I'm not sure that we want this for checking sequences of array indexing operations, since our assumption here is that indexing is insensitive to the scalar values. a deterministic sequence seems like a better set of initial values?
noted! the reason for the indirection is to facilitate testing of lazy indexing, which supports the numpy-style indexing semantics, and so requires the exact same set of tests. Without this indirection, I worried that we would have to write |
|
per @dcherian's recommendation, I'm going to go back to duplicated per-mode tests. We can deal with the cost of duplication (if any) when we start wiring up lazy indexing. |
…view feedback) Address @dcherian's review on zarr-developers#4109: the single mode-dispatched oracle (test_indexing_read_parity / _write_parity going through _get / _setitem / assert_read_matches_numpy) was hard to read, and the stacked `assume()` write filters made it impossible to tell which selections were actually exercised. - Replace the two folded tests with explicit per-mode read/write tests (test_basic_read, test_oindex_read, ... test_mask_write). Each is self-contained, calls the real accessor directly, and still covers sync + async + out= reads. - Writes are now CONSTRUCTIVE instead of generate-then-`assume`-reject: oindex draws unique per-axis indices, vindex draws distinct coordinates via unravel_index, so every write is well-defined by construction (nothing hidden behind assume). - Drop the now-unused assert_read_matches_numpy / _async_get / _VECTORIZED_MODES. The mode-dispatch helpers that remain are used only by the stateful machine (which is inherently random-mode); noted as such. - State machine: set model from the array (`self.model = self.zarray[:]`) per review. The consumer-agnostic shared harness stays on the lazy PR (zarr-developers#3906), where a lazy view actually reuses it — introducing that abstraction at the point of second use. Assisted-by: ClaudeCode:claude-opus-4.8
…n (roborev zarr-developers#341) Same fix as on the lazy branch, applied to the state machine here (the per-mode write parity tests are already constructive). _write_is_unambiguous / _n_array_axes were fed the broadcast np.ix_-style npsel instead of the raw per-axis zsel, so for oindex the per-axis uniqueness check tripped on tiling and the state machine's write rule skipped every multi-axis orthogonal write. Pass zsel; clarify the helper docstrings; add test_write_filter_rejects_broadcast_oindex pinning the invariant. Assisted-by: ClaudeCode:claude-opus-4.8
🤖 AI text below 🤖
Standalone test-infrastructure improvement extracted from the lazy-indexing work (#3906). It hardens the existing (eager) indexing API and has no dependency on the lazy
IndexTransformwork, so it can land independently. No lazy-indexing tests are included here — those live with the lazy PR; this PR only sets up the harness so that work can reuse it.What this adds
Reusable Hypothesis strategy (
zarr.testing.strategies)numpy_array_indexers(mode, shape)→ a(zarr_selection, numpy_selection)pair for any ofbasic/oindex/vindex/mask— the element-space modes that have a direct NumPy-array oracle — so an indexing test is written once and parametrized over mode. (Block indexing is deliberately out of scope: it addresses the chunk grid, not array elements, and has no NumPy equivalent; it keeps its ownblock_indicesstrategy.)NumPy-oracle parity tests (
tests/test_properties.py), folding the old per-mode tests into a small set:test_indexing_read_parity— every mode, for sync read, read into anout=buffer, and async read (addsout=coverage the per-mode tests lacked). The read oracle is dtype-strict (asserts dtype, not just values).test_indexing_write_parity— every mode; ill-defined writes (a cell hit more than once after negative-index normalization) are filtered via_write_is_unambiguous, and the known sharded multi-axis-oindexlimitation is anxfail, not a silent skip.test_indexing_bounds_error_parity— out-of-bounds integer selections must raise in zarr iff they raise in NumPy (zarr's bounds errors subclassIndexError). Catches the silent-wrong-data class — returning garbage where NumPy would raise.test_write_is_unambiguous— regression anchor for the duplicate-target filter.Stateful model-lockstep harness (slow-gated; runs in the Slow Hypothesis job):
_IndexingStateMachineapplies many random reads/writes to one array while a NumPy model tracks it in lockstep, asserting equality after every step — catching read-modify-write / chunk-boundary / shard-merge bugs the single-shot tests can't see. Runs over regular, sharded, and 3-D geometries.These borrow the strongest ideas from TensorStore's
driver_testutil(model lockstep) and ndindex'scheck_same(error parity + dtype-strict comparison).Forward-compatibility (no lazy dependency)
The read oracle is consumer-agnostic —
assert_read_matches_numpy(target, ...)wheretargetis anyzarr.Array— so once lazy indexing lands, the lazy PR can pass a lazy view through this exact harness instead of duplicating it.Verified on
main: fulltests/test_properties.pygreen (16 passed, 3 skipped [the slow state machines], 1 xfailed); the Slow Hypothesis job exercises the state machines; ruff/mypy clean.🤖 Generated with Claude Code