Skip to content

Statically-typed configuration object (donfig retained as the config reader)#4101

Open
d-v-b wants to merge 33 commits into
zarr-developers:mainfrom
d-v-b:statically-typed-config
Open

Statically-typed configuration object (donfig retained as the config reader)#4101
d-v-b wants to merge 33 commits into
zarr-developers:mainfrom
d-v-b:statically-typed-config

Conversation

@d-v-b

@d-v-b d-v-b commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Gives zarr.config a statically-typed, dataclass-backed representation in src/zarr/core/config.py, while keeping donfig as the reader for environment variables and YAML config files. zarr.config now provides precise static types for both attribute access (zarr.config.array.order) and the dotted-string API (zarr.config.get("array.order")), plus runtime validation with typo suggestions — while donfig's config-file locations, parsing, and precedence are preserved by construction.

Motivation: donfig stores config as an untyped nested dict, so every value is typed Any and typos in keys go undetected. This models the config as frozen dataclasses (the typed source of truth) and surfaces the familiar string API through hand-written @overloads that relate each dotted key to its value type. The goal is a solidly-typed config, not removing donfig — the reader has no typing value, so donfig keeps doing it.

Note: an earlier revision of this PR removed donfig and reimplemented its env/YAML ingest. Per review feedback (thanks @maxrjones), that reimplementation is exactly where several regressions originated (config-file locations, etc.), so donfig has been reinstated as the reader. The dataclass remains the typed representation; only the reading layer is donfig's.

Design

  • Schema: frozen, slotted dataclasses (ZarrConfig + ArraySettings / AsyncSettings / ThreadingSettings / CodecPipelineSettings). codecs is an open Mapping[str, str] subtree so arbitrary codec names still register at runtime.
  • Typed string API: hand-written @overloads map each dotted key ("array.order") to its value type (Literal["C", "F"]), plus a str fallback for codecs.*/unknown keys. A drift-protection test asserts every structured key has a matching overload, so the two can't fall out of sync.
  • Subtree access: subtree reads return the typed settings dataclass, which also supports donfig-style item access (config.get("array")["order"]) via a small __getitem__ mixin, alongside attribute access (config.get("array").order).
  • async namespace: async is a Python keyword, so the dataclass field is async_ while the serialized key stays "async"; the string API (config.get("async.concurrency")) is unaffected.
  • Reader: build_config() uses donfig.Config("zarr") to read ZARR_* env vars and YAML files, then flattens the result and applies it onto the typed defaults (unknown keys warned + skipped). donfig owns discovery/parsing/precedence and its documented file locations.
  • State model: a process-global _base snapshot plus a ContextVar overlay. A bare config.set(...) updates only the global base, so a permanent set is visible across threads (last-writer-wins, matching donfig); with config.set(...) is promoted to a context-local overlay on __enter__ and unwound on __exit__, so scoped changes don't leak between threads/tasks or into the permanent base.

Backwards compatibility

The public zarr.config surface is preserved, and donfig-backed reading keeps env/YAML behavior identical. Verified across all in-repo consumers (registry, sync, abc/codec, codec_pipeline, group, abc/store, array_spec, metadata v2/v3):

  • config.get("a.b.c"[, default]), subtree config.get("codecs", {}).get(key), and subtree item/attribute access
  • config.set({...}) (permanent, cross-thread) and with config.set({...}) (scoped), plus the config.set(key=value) kwargs form
  • config.reset(), config.refresh(), config.enable_gpu(), config.defaults
  • ZARR_FOO__BAR env-var ingestion and YAML config files (donfig's standard locations: ZARR_CONFIG, ~/.config/zarr, /etc/zarr, sys.prefix/site.PREFIXES)
  • the deprecations mechanism (set raises BadConfigError on a removed key; get honors the default — both donfig-faithful)

Intentional, ratified behavior changes

  • Unknown keys — "validate at the API, parse external input": programmatic config.set/config.get now raise on an unknown structured key (typo detection — the point of the typed config), while env-var / YAML ingest tolerates unknown keys (skips with a warning) so a stray ZARR_* var or version-skewed config file can't crash import zarr. The open codecs.* namespace still accepts arbitrary keys.
  • config.defaults now returns a nested dict (donfig returned a one-element list[dict]).
  • Subtree reads return a typed settings object rather than a plain dict; attribute and item access both work, but dict-only operations (.keys(), dict(...)) no longer apply to a subtree.

These are documented in the changed changelog fragment.

Dependency change

  • donfig>=0.8 retained (now used only as the env/YAML reader).
  • No new runtime dependency (pyyaml arrives transitively via donfig, as before).

Testing

  • Full suite green locally (excluding the pre-existing, environment-flaky moto S3-server fixture, which CI covers): 6217 passed, 270 skipped, 4 xfailed; mypy clean (strict) via pre-commit.
  • tests/test_config_typed.py: schema/path helpers, env + YAML ingest through donfig (codecs-block deep-merge, directory scanning, precedence, unknown-key tolerance, ~/.config/zarr discovery), the promote-on-enter state model incl. cross-thread visibility, context-local with blocks, the nested-permanent-set no-leak case, subtree item access, deprecations, the overload drift-protection test, and assert_type static-typing assertions.

Notes for reviewers


🤖 Generated with Claude Code

d-v-b and others added 17 commits June 25, 2026 11:37
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Add _ENV_META_VARS frozenset containing ZARR_CONFIG and skip those
names in collect_env() before stripping the prefix. This prevents
ZARR_CONFIG=/path/to/cfg.yaml from becoming {"config": "..."} which
crashed build_config() with KeyError because ZarrConfig has no
'config' field. Add two regression tests covering the skip and the
no-raise behaviour.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
After any reset() or set() call, a ContextVar scope entry is planted in
the current context. Without re-syncing _scope in refresh(), a subsequent
rebuild of _base is invisible in that context because _current() always
prefers the scope entry. Fix refresh() to call self._scope.set(self._base)
after rebuilding, matching the pattern already used in reset(). Also remove
the dead contextlib.suppress(LookupError) in reset() — ContextVar.set()
never raises LookupError.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Remove the donfig Config subclass and its module-level instance.
Promote ZarrConfigManager (the typed proxy built in Tasks 1-3) to the
public `config` export.  Update tests to the new nested-dict `defaults`
form and fix a `codec_pipeline.name` typo (should be `codec_pipeline.path`).
Also add **kwargs support to `set()` for call-sites that pass top-level keys
as keyword arguments, and raise BadConfigError (ValueError) for removed
deprecated keys to preserve the backwards-compat guarantee tested by
`test_deprecated_config`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
`_apply_deprecation` now accepts a `raise_on_removed` parameter.
`set()` passes `raise_on_removed=True` (keeps raising `BadConfigError`
for removed keys).  `get()` passes `raise_on_removed=False` so removed
keys are treated as absent: a caller-supplied default is returned, or
`KeyError` is raised when no default is given.  This restores the
donfig-faithful behavior where `config.get("removed.key", default)`
never raised.  Also removes the now-dead `if resolved is None: continue`
branch from `set()` and rewrites the module docstring to drop the
phrase "mirrors the old donfig interface".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
apply_overrides() now catches KeyError from replace_path() for each
unknown key, emits a ZarrUserWarning naming the skipped key, and
continues. This prevents a stray ZARR_*=... env var or extra YAML key
from crashing build_config() (and therefore import zarr).

config.set() remains strict — it calls replace_path() directly, not
through apply_overrides().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
- Add _structured_leaf_keys helper (uses get_type_hints to handle
  from __future__ import annotations) and test_every_structured_key_has_a_get_overload
  that walks ZarrConfig recursively and asserts every leaf has a get() overload
- Add TYPE_CHECKING smoke function using assert_type to prove precise
  static return types for config.get() and attribute access
- Create changes/+statically-typed-config.misc.md changelog fragment
- Update docs/user-guide/config.md: remove donfig prose, fix pprint call
- Update docs/user-guide/installation.md: replace donfig dep with pyyaml
- Pin pyyaml>=6 in pyproject.toml project dependencies

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
…ch path

Remove the `k != "codecs"` guard in `_flatten_mapping` so a YAML `codecs:`
block produces flat dotted keys (e.g. `codecs.bytes`) that flow through
`_replace_recursive`'s Mapping branch and MERGE into the existing codec dict
rather than replacing it wholesale. Also fix `_config_search_paths` to accept
and consult the `environ` mapping supplied by `build_config` instead of
reading `os.environ` directly, making `build_config(environ=...)` self-consistent.
Adds regression tests (RED/GREEN) and a changelog note about `config.defaults`
now returning a plain `dict` instead of donfig's `list[dict]`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
The design spec and implementation plan live in a gist instead:
https://gist.github.com/d-v-b/2a95ff0104824ef52545ed9baf1b66c3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.84615% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.50%. Comparing base (1ab9953) to head (ae7d5b3).

Files with missing lines Patch % Lines
src/zarr/core/config.py 93.84% 16 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #4101    +/-   ##
========================================
  Coverage   93.50%   93.50%            
========================================
  Files          90       90            
  Lines       11981    12228   +247     
========================================
+ Hits        11203    11434   +231     
- Misses        778      794    +16     
Files with missing lines Coverage Δ
src/zarr/core/config.py 93.96% <93.84%> (-6.04%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b and others added 6 commits June 25, 2026 17:35
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
…test

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
set() takes Mapping[str, Any] (no static value validation) because precise
typing needs an open TypedDict (declared keys + arbitrary codecs.* str keys),
which mypy 2.x does not support; a closed TypedDict would break the open
config.set({'codecs.<name>': ...}) idiom. Revisit when mypy ships PEP 728 or
we adopt a checker that supports it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
Use object instead of Any for parameters/returns that merely store, pass
through, or compare values (path helpers, env/YAML ingest, set/update kwargs,
token, parse_indexing_order). Any is kept where it is load-bearing: the dynamic
dataclasses.replace/fields dispatch, the get() fallback overload that powers
config.get('codecs', {}).get(name), and the heterogeneous nested tree returned
by to_nested_dict/defaults/to_dict (navigated by key). object also let mypy
narrow parse_indexing_order, removing a redundant cast.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
@d-v-b

d-v-b commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

I'm adding some useful error messages:

  5 Traceback (most recent call last):
  4   File "<stdin>", line 1, in <module>
  3   File "/home/d-v-b/wt/statically-typed-config/src/zarr/core/config.py", line 483, in get
  2     raise _unknown_key_error(key, current) from None
  1 KeyError: "'arraya' is not a valid configuration key. Did you mean 'array'?"

@d-v-b d-v-b marked this pull request as ready for review June 26, 2026 08:10
Comment thread src/zarr/core/config.py
Comment on lines +399 to +429
def get(self, key: Literal["default_zarr_format"]) -> Literal[2, 3]: ...
@overload
def get(self, key: Literal["array.order"]) -> Literal["C", "F"]: ...
@overload
def get(self, key: Literal["array.write_empty_chunks"]) -> bool: ...
@overload
def get(self, key: Literal["array.read_missing_chunks"]) -> bool: ...
@overload
def get(self, key: Literal["array.target_shard_size_bytes"]) -> int | None: ...
@overload
def get(self, key: Literal["array.rectilinear_chunks"]) -> bool: ...
@overload
def get(self, key: Literal["array.sharding_coalesce_max_gap_bytes"]) -> int: ...
@overload
def get(self, key: Literal["array.sharding_coalesce_max_bytes"]) -> int: ...
@overload
def get(self, key: Literal["async.concurrency"]) -> int: ...
@overload
def get(self, key: Literal["async.timeout"]) -> float | None: ...
@overload
def get(self, key: Literal["threading.max_workers"]) -> int | None: ...
@overload
def get(self, key: Literal["json_indent"]) -> int: ...
@overload
def get(self, key: Literal["codec_pipeline.path"]) -> str: ...
@overload
def get(self, key: Literal["codec_pipeline.batch_size"]) -> int: ...
@overload
def get(self, key: Literal["buffer"]) -> str: ...
@overload
def get(self, key: Literal["ndbuffer"]) -> str: ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flagging this as nice, but delicate (polite term for brittle). These overloads have to be kept synced up with the actual structure of the configuration, which is defined elsewhere. This PR adds tests that confirm that these overloaded signatures match the field name + type of the config object. I'm not sure how we can do do any better than that, if we want a string-based API like this to be statically typed.

an alternative that we can add later would be a non-string-based API, where we define an object that acts like a cursor / lens over our config.

Comment thread tests/test_config.py
def test_config_codec_implementation(store: Store) -> None:
# has default value
assert fully_qualified_name(get_codec_class("blosc")) == config.defaults[0]["codecs"]["blosc"]
assert fully_qualified_name(get_codec_class("blosc")) == config.defaults["codecs"]["blosc"]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

donfig returned defaults as a list, we just return a dict

d-v-b and others added 2 commits June 26, 2026 10:28
config.get/config.set on an unknown key now raise a KeyError that names the key
and suggests the most similar valid one (via difflib), e.g.
'array.0rder' -> "Did you mean 'array.order'?". Candidates are the schema's
container and leaf keys plus current codecs.* entries. Still a KeyError, so
existing handlers are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
…evel

When an unknown key has no close match, list the available keys at the deepest
resolvable level (capped at 10, '... (N more)' beyond). Suggestions are now
scoped to the failed segment vs that level's children, which avoids misleading
prefix matches (e.g. 'codecs.unknown' no longer suggests 'codecs.numcodecs.zstd';
it lists the codec names instead). Use explicit len()/!= '' checks rather than
collection truthiness, and is_dataclass for narrowing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XKHgWSxDXtTmNgAebZg41U
The first `set`/`get` pair set `codec_pipeline.path` to the default
`BatchedCodecPipeline` and asserted that default, so it passed
regardless of whether `set` did anything — a no-op assertion donfig's
permissiveness had masked (it also carried the original `.name` typo).
The Mock pipeline block below already exercises set->get->actual-use
with a non-default class, so this pair was redundant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018fnKiuy15kq7cPgRSKD3dr
@d-v-b

d-v-b commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

I was motivated to open this PR because I think we need a more explicit data structure for our config. Donfig worked well but a morass of untyped dicts is not the UX we want if we plan to be agile with deprecations and feature additions.

@d-v-b d-v-b requested review from chuckwondo and maxrjones June 26, 2026 08:33
@maxrjones

Copy link
Copy Markdown
Member

I really like this direction @d-v-b! Thanks for working on it.

I'll give it a spin, likely on Monday.

@maxrjones maxrjones left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @d-v-b!

I like that this brings validation of configuration settings along with suggestions for fixing wrong configuration inputs. I'm neutral about donfig vs. just pyyaml as a dependency. I don't think replacing donfig is strictly necessary for getting strict type checking, but also donfig doesn't bring a whole lot of extra value as a dependency such that it's motivating to keep it.

There are some regressions relative to donfig that I think should be addressed or at a minimum included in a changelog fragment under the changed category.

The hybrid state model leads to cross-thread permanent-set shadowing

import threading
from zarr.core.config import ZarrConfigManager
cfg = ZarrConfigManager(); cfg.set({"async.concurrency": 1})
t=threading.Thread(target=lambda: cfg.set({"async.concurrency": 999})); t.start(); t.join()
print("branch main sees:", cfg.get("async.concurrency"))
# -> branch main sees: 1
import threading
from donfig import Config
c = Config("z", defaults=[{"async": {"concurrency": 10}}]); c.set({"async.concurrency": 1})
t=threading.Thread(target=lambda: c.set({"async.concurrency": 999})); t.start(); t.join()
print("donfig main sees:", c.get("async.concurrency"))
# -> donfig main sees: 999

Subtree .get() returns a dataclass

from zarr.core.config import config
print("type:", type(config.get("array")).__name__)
config.get("array")["order"] # used to work

Previous config file locations are no longer picked up

See inlined comment + https://donfig.readthedocs.io/en/latest/configuration.html#yaml-files

Comment thread changes/4101.misc.md Outdated
Comment thread docs/user-guide/config.md Outdated
@d-v-b

d-v-b commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

I like that this brings validation of configuration settings along with suggestions for fixing wrong configuration inputs. I'm neutral about donfig vs. just pyyaml as a dependency. I don't think replacing donfig is strictly necessary for getting strict type checking, but also donfig doesn't bring a whole lot of extra value as a dependency such that it's motivating to keep it.

the core of this PR is defining our config as a dataclass that we manage. how could we do that, and still use donfig? iirc, donfig just exposes an untyped dict-like API. So there would be no way to relate the contents of the donfig config with our normative dataclass. Maybe I'm missing some donfig feature or something?

@maxrjones

Copy link
Copy Markdown
Member

I like that this brings validation of configuration settings along with suggestions for fixing wrong configuration inputs. I'm neutral about donfig vs. just pyyaml as a dependency. I don't think replacing donfig is strictly necessary for getting strict type checking, but also donfig doesn't bring a whole lot of extra value as a dependency such that it's motivating to keep it.

the core of this PR is defining our config as a dataclass that we manage. how could we do that, and still use donfig? iirc, donfig just exposes an untyped dict-like API. So there would be no way to relate the contents of the donfig config with our normative dataclass. Maybe I'm missing some donfig feature or something?

I think if you consider the goal being managing our config internally as dataclasses, donfig must go. If the goal is provide typing and suggestions for configuration settings, that can be built on top of donfig since types can be made more strict.

…YAML file discovery

Follow-up to review feedback on the statically-typed config (PR zarr-developers#4101),
restoring donfig-compatible behavior that the initial implementation regressed:

- Threading: a bare `config.set(...)` now updates only the process-global base,
  so a permanent set from any thread is visible everywhere (last-writer-wins).
  The ContextVar scope is established in `_ConfigSet.__enter__`/`__exit__`, so a
  `with config.set(...)` stays context-local. Fixes cross-thread shadowing where
  a thread that had already done a permanent set kept seeing its own stale value.
- Subtree access: settings dataclasses gain a `_ConfigNode.__getitem__` mixin so
  `config.get("array")["order"]` works again alongside `.order`, raising KeyError
  on unknown keys as the old dicts did.
- YAML file locations: `_config_search_paths` reproduces donfig's full search
  order (/etc/zarr via ZARR_ROOT_CONFIG, sys.prefix + site.PREFIXES etc/zarr,
  ~/.config/zarr, and ZARR_CONFIG last), so existing zarr.yaml files keep being
  discovered. Documented in the config user guide.

Also adds a `changed` towncrier fragment category (mirroring the built-in
defaults) and moves the PR's changelog entry there. Extends test_config_typed
with coverage for cross-thread visibility, context-local `with` blocks, subtree
item access, the search-path ordering, and end-to-end YAML loading (directory
scanning, extension filtering, precedence, unknown-key tolerance, and
~/.config/zarr discovery).

Assisted-by: ClaudeCode:claude-opus-4.8
@d-v-b

d-v-b commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

If the goal is provide typing and suggestions for configuration settings, that can be built on top of donfig since types can be made more strict.

Can you outline how this would work?

…ent set

Addresses a roborev review finding. A bare `config.set(...)` executed inside an
active `with config.set(...)` block computed its new snapshot from `_current()`
(the block's overlay) and wrote it to the global base, so an overlay-only key
such as `array.order` leaked and persisted permanently after the block exited.

`set` now derives two snapshots: `scoped` (layered on the current view, pinned by
`_ConfigSet.__enter__` for the `with` path) and `permanent` (layered on the
global base, written eagerly). A nested permanent set therefore carries only its
own delta onto the base; the block's overlay reverts cleanly on exit. Outside any
`with` block the two snapshots are identical, so the common path is unchanged.

Assisted-by: ClaudeCode:claude-opus-4.8
@d-v-b

d-v-b commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

maybe this aligns with your thinking, but I'm going to explore keeping donfig as the config reader, and use the dataclass as the source of truth for the structure of the config

…s the typed representation

The goal of this PR is a solidly-typed config, not removing donfig. Removing
donfig meant reimplementing its ingest (env parsing, YAML discovery, precedence),
which is exactly where the regressions reviewers found originated. donfig adds no
typing value, so it now handles reading again while the dataclass remains the
typed source of truth.

- `build_config` delegates env and YAML ingest to `donfig.Config("zarr")`, then
  flattens the result and applies it onto the typed defaults via `apply_overrides`.
  donfig owns discovery, parsing, precedence, and its documented config-file
  locations, so those are preserved by construction.
- Deleted the reimplemented `collect_env`, `collect_yaml`, `_config_search_paths`,
  `_parse_env_value`, and the `ENV_PREFIX`/`_ENV_META_VARS` constants (and the now
  unused `ast`/`contextlib`/`os`/`site`/`sys` imports). Kept `_flatten_mapping`
  (donfig's nested dict to dotted keys) and `apply_overrides` (the dict to dataclass
  boundary, which still warns and skips unknown keys). donfig's reader also surfaces
  the `ZARR_CONFIG`/`ZARR_ROOT_CONFIG` path directives as `config`/`root_config`
  keys, which are stripped before applying.
- Restored `donfig>=0.8` as a dependency (deps, min-deps pin, upstream git dep,
  version-report) and dropped the direct `pyyaml` dep (donfig brings it).
- Reworked the config tests: env and YAML behavior is now exercised end-to-end
  through donfig via a `monkeypatch`-based helper; dropped the now-internal
  collect/search-path unit tests; the `~/.config/zarr` discovery guard remains.

The typed representation, the `get` overloads, runtime validation/suggestions, and
the fixed cross-thread/context-local `set` model are unchanged.

Assisted-by: ClaudeCode:claude-opus-4.8
@d-v-b d-v-b marked this pull request as draft July 1, 2026 08:42
@d-v-b d-v-b changed the title Replace donfig with a statically-typed configuration object Statically-typed configuration object (donfig retained as the config reader) Jul 1, 2026
The towncrier config and changes/README.md gained a changed category, but the
CI changelog validator had its own hardcoded type allowlist. Add changed there
so 4101.changed.md passes the check.

Assisted-by: ClaudeCode:claude-opus-4.8
@d-v-b d-v-b marked this pull request as ready for review July 1, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants