Skip to content

Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113

Draft
TomNicholas wants to merge 2 commits into
zarr-developers:mainfrom
TomNicholas:feat/pipeline-use-get-many
Draft

Use Store.get_many for whole-chunk reads in BatchedCodecPipeline#4113
TomNicholas wants to merge 2 commits into
zarr-developers:mainfrom
TomNicholas:feat/pipeline-use-get-many

Conversation

@TomNicholas

Copy link
Copy Markdown
Member

Builds on #4112. BatchedCodecPipeline.read now fetches a whole (non-sharded) request with a single Store.get_many call instead of one get per chunk, so a store can batch/coalesce the underlying reads — independently of codec_pipeline.batch_size, which still governs only decode batching.

The sharding codec's partial-decode path is unchanged, and stores without a specialized get_many fall back to the previous concurrent per-chunk behavior.

Motivation — xref #1758 (request coalescing), #1806 (batched Store API), and zarr-developers/VirtualiZarr#947 (files-as-shards / consolidating small reads).

Stacked on #4112 — its commit is the first one here; review after it. Draft.

Add a public, overridable `Store.get_many` that retrieves many values at
once - each request being a whole key or a `(key, byte_range)` pair. It
generalizes `Store.get_ranges` (many ranges of one key) to many keys, and
yields `(request_index, Buffer | None)` batches in completion order so a
store can coalesce reads that land in the same underlying object.

The ABC default fetches requests concurrently with `get`, so every store
works out of the box; stores with a bulk backend override it (`FsspecStore`
coalesces via fsspec's `cat_ranges`). Coalescing tuning is left to each
store rather than exposed on the interface.

This restores and generalizes the batched-fetch capability of the v2
`getitems` Store API (see zarr-developersgh-1806).
BatchedCodecPipeline.read now fetches the encoded bytes for an entire
(non-sharded) read with a single Store.get_many call, instead of one
Store.get per chunk. It drives get_many over all chunk keys, scatters the
completion-ordered (index, buffer) results back into position, and feeds
them to the per-batch decode path.

This lets a store batch or coalesce the underlying reads (e.g. FsspecStore
via cat_ranges, or a custom store such as virtualizarr's ManifestStore /
icechunk's IcechunkStore that overrides get_many) regardless of
codec_pipeline.batch_size, which still governs only decode batching. The
sharding codec's partial-decode path is untouched, and stores without a
specialized get_many fall back to the previous concurrent per-chunk gets.
@TomNicholas TomNicholas force-pushed the feat/pipeline-use-get-many branch from d8a292d to 4f1ad9f Compare July 1, 2026 21:00
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.15068% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.51%. Comparing base (1ab9953) to head (4f1ad9f).

Files with missing lines Patch % Lines
src/zarr/core/codec_pipeline.py 87.50% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4113   +/-   ##
=======================================
  Coverage   93.50%   93.51%           
=======================================
  Files          90       90           
  Lines       11981    12051   +70     
=======================================
+ Hits        11203    11269   +66     
- Misses        778      782    +4     
Files with missing lines Coverage Δ
src/zarr/abc/store.py 96.38% <100.00%> (+0.20%) ⬆️
src/zarr/storage/_fsspec.py 91.50% <100.00%> (+0.17%) ⬆️
src/zarr/testing/store.py 99.46% <100.00%> (+0.02%) ⬆️
src/zarr/core/codec_pipeline.py 94.26% <87.50%> (-1.03%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilan-gold

Copy link
Copy Markdown
Contributor

Can you highlight what the relation of this to #3925 is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants