Skip to content

test: comprehensive eager indexing parity oracle + reusable indexing strategies#4109

Open
d-v-b wants to merge 13 commits into
zarr-developers:mainfrom
d-v-b:lazy-indexing-pr2-indexing-tests
Open

test: comprehensive eager indexing parity oracle + reusable indexing strategies#4109
d-v-b wants to merge 13 commits into
zarr-developers:mainfrom
d-v-b:lazy-indexing-pr2-indexing-tests

Conversation

@d-v-b

@d-v-b d-v-b commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

🤖 AI text below 🤖

Standalone test-infrastructure improvement extracted from the lazy-indexing work (#3906). It hardens the existing (eager) indexing API and has no dependency on the lazy IndexTransform work, so it can land independently. No lazy-indexing tests are included here — those live with the lazy PR; this PR only sets up the harness so that work can reuse it.

What this adds

Reusable Hypothesis strategy (zarr.testing.strategies)

  • numpy_array_indexers(mode, shape) → a (zarr_selection, numpy_selection) pair for any of basic / oindex / vindex / mask — the element-space modes that have a direct NumPy-array oracle — so an indexing test is written once and parametrized over mode. (Block indexing is deliberately out of scope: it addresses the chunk grid, not array elements, and has no NumPy equivalent; it keeps its own block_indices strategy.)

NumPy-oracle parity tests (tests/test_properties.py), folding the old per-mode tests into a small set:

  • test_indexing_read_parity — every mode, for sync read, read into an out= buffer, and async read (adds out= coverage the per-mode tests lacked). The read oracle is dtype-strict (asserts dtype, not just values).
  • test_indexing_write_parity — every mode; ill-defined writes (a cell hit more than once after negative-index normalization) are filtered via _write_is_unambiguous, and the known sharded multi-axis-oindex limitation is an xfail, not a silent skip.
  • test_indexing_bounds_error_parity — out-of-bounds integer selections must raise in zarr iff they raise in NumPy (zarr's bounds errors subclass IndexError). Catches the silent-wrong-data class — returning garbage where NumPy would raise.
  • test_write_is_unambiguous — regression anchor for the duplicate-target filter.

Stateful model-lockstep harness (slow-gated; runs in the Slow Hypothesis job):

  • _IndexingStateMachine applies many random reads/writes to one array while a NumPy model tracks it in lockstep, asserting equality after every step — catching read-modify-write / chunk-boundary / shard-merge bugs the single-shot tests can't see. Runs over regular, sharded, and 3-D geometries.

These borrow the strongest ideas from TensorStore's driver_testutil (model lockstep) and ndindex's check_same (error parity + dtype-strict comparison).

Forward-compatibility (no lazy dependency)

The read oracle is consumer-agnostic — assert_read_matches_numpy(target, ...) where target is any zarr.Array — so once lazy indexing lands, the lazy PR can pass a lazy view through this exact harness instead of duplicating it.

Verified on main: full tests/test_properties.py green (16 passed, 3 skipped [the slow state machines], 1 xfailed); the Slow Hypothesis job exercises the state machines; ruff/mypy clean.

🤖 Generated with Claude Code

…strategies

Extracted from the lazy-indexing work (PR zarr-developers#3906) as a standalone, eager-only
improvement that hardens the *existing* indexing API and stands on its own.

- zarr.testing.strategies gains two reusable, documented strategies:
  - indexers(mode, shape): one strategy returning a (zarr_selection,
    numpy_selection) pair for any of basic / oindex / vindex / mask, so indexing
    tests are written once and parametrized over mode.
  - windows(shape): non-negative, full-rank slice windows for sub-region tests.
- test_properties.py folds the four per-mode indexing tests into a single
  NumPy-oracle test, test_indexing_parity, covering {basic, oindex, vindex, mask}
  for sync read, read into an out= buffer, async read, and write (with the
  historical per-mode write guards preserved). This adds out=-buffer coverage the
  per-mode tests lacked.
- A consumer-agnostic helper (assert_read_matches_numpy) and a
  test_lazy_view_indexing_parity test are included but the latter is skipped
  unless Array.lazy exists, so the lazy-indexing feature can later reuse the same
  oracle without re-deriving any setup.

Assisted-by: ClaudeCode:claude-opus-4.8
@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 30, 2026
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 62.50000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.47%. Comparing base (1ab9953) to head (230bdff).

Files with missing lines Patch % Lines
src/zarr/testing/strategies.py 62.50% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4109      +/-   ##
==========================================
- Coverage   93.50%   93.47%   -0.04%     
==========================================
  Files          90       90              
  Lines       11981    11997      +16     
==========================================
+ Hits        11203    11214      +11     
- Misses        778      783       +5     
Files with missing lines Coverage Δ
src/zarr/testing/strategies.py 93.85% <62.50%> (-1.54%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…nt mode collections

- Type the `mode` parameter as the `IndexMode` literal (not `str`) across the
  test helpers (`_get`, `_async_get`, `_setitem`, `_eligible`,
  `assert_read_matches_numpy`).
- Make `_VECTORIZED_MODES` a tuple like `_INDEX_MODES` (was a `frozenset`) for
  consistency; `_INDEX_MODES` must stay an ordered sequence for `sampled_from`.
- `indexers(...)` now returns `tuple[Selection, Selection]` (the real
  `zarr.core.indexing.Selection` type) instead of `tuple[Any, Any]`.

Assisted-by: ClaudeCode:claude-opus-4.8
@d-v-b d-v-b force-pushed the lazy-indexing-pr2-indexing-tests branch from 2ac5f75 to b75c702 Compare June 30, 2026 14:16
Comment thread tests/test_properties.py Outdated
Comment on lines +445 to +454
# write, mirroring the historical per-mode guards:
# - vindex setitem is chunk-dependent for repeated coords (skipped)
# - oindex/mask writes to sharded arrays are unsupported (GH2834)
# - oindex writes with repeated indices are unspecified
if mode == "vindex":
return
if mode in ("oindex", "mask") and zarray.shards is not None:
return
if mode == "oindex" and _has_repeated_indices(npsel):
return

@dcherian dcherian Jun 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is why i didn't do this haha.

From a quick glance I can't tell what makes it through.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i hate this! i'm looking for something less dirty

d-v-b added 2 commits June 30, 2026 17:15
…ted write

Replace the brittle silent-`return` write guards with a principled split:
- test_indexing_read_parity: reads (sync/out=/async), all modes, no guards.
- test_indexing_write_parity: writes, filtering with `assume` only the cases with
  no well-defined oracle — repeated coordinates (order-dependent; a fundamental
  limitation shared with NumPy, not a bug). Now exercises vindex writes, sharded
  single-axis oindex/mask writes, and unsharded multi-axis oindex writes that the
  old blanket `assume(shards is None)` skipped.
- test_oindex_sharded_multiaxis_write_xfail: pins the one genuinely-unsupported
  write (orthogonal across >=2 array axes on a sharded array, GH2834) as a strict
  xfail, visible at the pytest level.

Assisted-by: ClaudeCode:claude-opus-4.8
…dex normalization

CI slow-hypothesis falsified test_indexing_write_parity with a vindex write
[2, -2, 0, 1] on length 4 (-2 -> 2, so cell 2 written twice — undefined,
order-dependent). The old filter checked raw values and missed the post-
normalization collision. Replace _has_repeated_indices with mode-aware
_write_is_unambiguous(mode, npsel, shape) that normalizes negatives first
(per-axis for oindex, coordinate-tuples for vindex). Test-oracle gap, not a bug.

Assisted-by: ClaudeCode:claude-opus-4.8
Comment thread src/zarr/testing/strategies.py Outdated
d-v-b added 6 commits June 30, 2026 21:05
… + dtype-strict oracle

Borrow the strongest ideas from TensorStore's and ndindex's indexing test suites:

- Stateful harness (TensorStore driver_testutil pattern, NumPy oracle):
  _IndexingStateMachine applies many random reads/writes (basic/oindex/vindex/mask)
  to one array while a NumPy model tracks it in lockstep, asserting equality after
  every step. Catches read-modify-write, chunk-boundary, and shard-merge bugs the
  single-shot parity tests cannot. Runs over regular/sharded/3D geometries, gated
  behind --run-slow-hypothesis like the store stateful tests.

- Error parity (ndindex check_same): test_indexing_bounds_error_parity generates
  possibly-out-of-bounds integer selections and asserts zarr raises iff NumPy
  raises (zarr's bounds errors subclass IndexError). Catches the silent-wrong-data
  class — returning garbage where NumPy would raise.

- Strict read oracle: assert dtype, not just values/shape, for ndim>=1 results.

- Regression anchor: test_write_is_unambiguous pins the negative-index
  collision fix ([2,-2,0,1] -> cell 2 written twice).

Assisted-by: ClaudeCode:claude-opus-4.8
…cope

The strategy covers only the element-space indexing modes that have a direct
NumPy-array oracle (basic/oindex/vindex/mask). Rename it so the name states that,
and document that block indexing is deliberately excluded — it addresses the chunk
grid (parametrized by the grid, not shape) and has no NumPy equivalent, so it lives
in the separate block_indices strategy.

Assisted-by: ClaudeCode:claude-opus-4.8
mkdocs renders single backticks for inline code; the docstrings used RST-style
double backticks. Normalize to single backticks across the indexing strategies
and property tests.

Assisted-by: ClaudeCode:claude-opus-4.8
…ambiguous

NumPy's deterministic last-write-wins is an implementation detail of serial,
C-order index iteration, not a guarantee. zarr writes scalars into chunks and
cannot in general promise a write order, so it cannot be expected to match NumPy's
choice — hence no well-defined oracle for duplicate-target writes.

Assisted-by: ClaudeCode:claude-opus-4.8
…ases

Replace raw pytest.param tuples with the conftest Expect dataclass (input /
output / id), matching the established idiom in test_chunk_grids etc.

Assisted-by: ClaudeCode:claude-opus-4.8
test_lazy_view_indexing_parity is skipif(not hasattr(Array, "lazy")), so it never
runs in this eager-only PR — dead weight that falsely reads as coverage. It (and
the windows strategy, whose only consumer it was) belong with the lazy indexing
work, not here. The consumer-agnostic assert_read_matches_numpy harness stays, so
the lazy PR can re-add the view test on top.

Assisted-by: ClaudeCode:claude-opus-4.8
@d-v-b d-v-b marked this pull request as ready for review June 30, 2026 20:08
@d-v-b d-v-b requested a review from dcherian July 1, 2026 13:19

@dcherian dcherian left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest, I find this extremely hard to read with all the extra indirection. The duplication in the previous version made it far more readable. And given that this is all agent-generated, why not just duplicated and optimize for readability?

Comment thread tests/test_properties.py Outdated
Comment on lines +646 to +649
self.zarray = zarr.create_array(
{}, shape=self.shape, chunks=self.chunks, shards=self.shards, dtype="i4"
)
self.zarray[...] = self.model

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the generator here? and set self.model = self.array[:]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean get our array values from a generator? I'm not sure that we want this for checking sequences of array indexing operations, since our assumption here is that indexing is insensitive to the scalar values. a deterministic sequence seems like a better set of initial values?

@d-v-b

d-v-b commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

to be honest, I find this extremely hard to read with all the extra indirection. The duplication in the previous version made it far more readable. And given that this is all agent-generated, why not just duplicated and optimize for readability?

noted! the reason for the indirection is to facilitate testing of lazy indexing, which supports the numpy-style indexing semantics, and so requires the exact same set of tests. Without this indirection, I worried that we would have to write _lazy versions of every indexing method test, and this invites mistakes.

@d-v-b

d-v-b commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

per @dcherian's recommendation, I'm going to go back to duplicated per-mode tests. We can deal with the cost of duplication (if any) when we start wiring up lazy indexing.

d-v-b added 2 commits July 1, 2026 17:44
…view feedback)

Address @dcherian's review on zarr-developers#4109: the single mode-dispatched oracle
(test_indexing_read_parity / _write_parity going through _get / _setitem /
assert_read_matches_numpy) was hard to read, and the stacked `assume()` write
filters made it impossible to tell which selections were actually exercised.

- Replace the two folded tests with explicit per-mode read/write tests
  (test_basic_read, test_oindex_read, ... test_mask_write). Each is self-contained,
  calls the real accessor directly, and still covers sync + async + out= reads.
- Writes are now CONSTRUCTIVE instead of generate-then-`assume`-reject: oindex draws
  unique per-axis indices, vindex draws distinct coordinates via unravel_index, so
  every write is well-defined by construction (nothing hidden behind assume).
- Drop the now-unused assert_read_matches_numpy / _async_get / _VECTORIZED_MODES.
  The mode-dispatch helpers that remain are used only by the stateful machine
  (which is inherently random-mode); noted as such.
- State machine: set model from the array (`self.model = self.zarray[:]`) per review.

The consumer-agnostic shared harness stays on the lazy PR (zarr-developers#3906), where a lazy
view actually reuses it — introducing that abstraction at the point of second use.

Assisted-by: ClaudeCode:claude-opus-4.8
…n (roborev zarr-developers#341)

Same fix as on the lazy branch, applied to the state machine here (the per-mode
write parity tests are already constructive). _write_is_unambiguous / _n_array_axes
were fed the broadcast np.ix_-style npsel instead of the raw per-axis zsel, so for
oindex the per-axis uniqueness check tripped on tiling and the state machine's write
rule skipped every multi-axis orthogonal write. Pass zsel; clarify the helper
docstrings; add test_write_filter_rejects_broadcast_oindex pinning the invariant.

Assisted-by: ClaudeCode:claude-opus-4.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants