refactor: clean-slate methodology template rebuild (copier + staged-contract pipeline) by nullhack · Pull Request #187 · nullhack/temple8

nullhack · 2026-07-03T14:42:15Z

Summary

A clean-slate rebuild of temple8 as a pure methodology template (copier-based), replacing the prior beehave/BDD orchestration layer with a staged-contract pipeline wired into flowr.

What's in

Pipeline (flowr): discover → explore → plan → build → deliver → shipped; 6 flows; one owner per state; evidence-based gates (CI is the enforcement backstop).
Methodology layer: 11 agents, 21 flow-bound + 4 standalone skills, 35 knowledge files across methodology/, workflow/, requirements/, software-craft/, architecture/, writing/, design/.
Staged contracts: test .pyi → test .py (@pytest.mark.pending) → source .pyi → simulate; build implements source .py from the fixed .pyi one cycle at a time; mypy.stubtest is the sole .pyi/.py drift detector.
Secrets model: dotenv_values() into a frozen Settings (never os.environ); secrets out-of-workspace at ~/.secrets/<project>.env; agent instruct/ask-never-create protocol; gitleaks in CI.
Copier instantiator: copier.yml questionnaire, pyproject.toml.jinja + README.md.jinja, project-instantiator agent + instantiate-project skill; instances start empty of source/tests by design.
Docstring lifecycle (phased): source naked during build, stripped at select, regenerated at merge-to-dev; dev lint = bug-catchers, merge lint adds SIM/RUF + format.
Design + architecture + writing domains grounded in current standards (WCAG 2.2, RFC 9457, Norman, Nielsen, Bass, Evans, Vernon, Kobak 2024, Jackson 2026).
Research folder behind every citation (verify-never-recall).
CI: ruff + flowr validate always on; pyright/stubtest/pytest guarded on package+tests.

Verification

6 flows validate {valid:true}
ruff check . clean
citation graph closed (35 knowledge files, zero forward-refs)
copier .jinja files render with zero leftover markers; conditional _exclude logic verified both ways
end-to-end copier copy smoke-test deferred to post-merge (template files now committed)

Notes

The weather-lookup dogfood was developed inside temple8 to validate the pipeline, then fully reverted (3ba8599); its commits remain in history but net to nothing.
Dependabot/security alerts on the default branch are pre-existing.

@pending

Replaces the removed spec-driven/beehave orchestration layer on the refactor/clean-slate branch (old layer preserved in .backup/, recoverable from origin/main). Flow set (.flowr/flows/): - pipeline-flow orchestrates discover -> explore -> plan -> build -> deliver -> shipped - 5 subflows; 21 inner states; one owner and one skill per state - staged-contract surface: test .pyi -> test .py (@pending) -> source .pyi -> simulate - @pytest.mark.pending is the backlog signal and skip mechanism - stubtest (mypy.stubtest) drift-checks source and test .pyi at every gate Methodology (.opencode/): - knowledge/methodology/: separation-of-concerns, agent-files, skill-files, knowledge-files - agents/: 11-agent roster (5 primary owners, 4 specialist doers, 2 consult-only) - skills/: 21 lean per-state procedures (Load step, numbered, IF-THEN/wikilink cues) - knowledge/requirements/interview-techniques.md authored (shared funnel technique) Tooling and CI: - pyproject: PYI added to ruff select; D/pydocstyle dropped (no-docstrings policy); mypy + vcrpy/pytest-vcr dev deps; pending marker registered; stubtest task - .github/workflows/ci.yml: ruff + flowr validate always-on; pyright/stubtest/pytest guarded on package+tests presence (evidence vs enforcement) - conftest.py: pending-marker skip hook - .templates/docs/glossary.md.template: ubiquitous-language glossary skeleton Docs: AGENTS.md rewritten for the new lifecycle; TODO.md tracks open work.

Drop interview-techniques from interview-features and consolidate-interview load lists — those levels do structural decomposition and synthesis, not CIT/Laddering elicitation, so loading the shared technique bled undirected elicitation into them. interview-techniques is now cited only by the two elicitation levels (interview-general, interview-cross-cutting), which direct every technique they load. Update TODO with the missing tracks: flow design debt (plan-flow rework gaps), pyproject+tooling cleanup, docs (deleted-but-referenced README), and the knowledge-to-author list.

…rename Author the DDD decomposition knowledges and retitle the funnel's third level off the product term "feature" onto the DDD term "building block". - Add requirements/domain-decomposition.md (aggregate-first decomposition, gap analysis as coverage matrix) and requirements/aggregate-boundaries.md (sizing/splitting aggregates); repoint citations in interview-techniques and the interview/review skills - Rename interview-features -> interview-building-blocks: state, skill, trigger (building-blocks-done, aligned with general/cross-cutting-done), description ("Tactical decomposition"); drop the contradictory L2 step - Reframe agent identities off "feature" -> "building block" across domain-expert, product-owner, reviewer, release-engineer - discovery-flow 2.1.0 -> 2.2.0 - derive-source-stubs input: test .pyi -> test .py (bodies hold the domain types the stubs are derived from) - Rationalize working-state paths: probe scripts explore/ -> .cache/explore/ (consolidates gitignored working state); rewrite .gitignore lean and project-only; add a Project layout section to AGENTS.md

…rch bar Author two domain knowledges and encode the depth/research authoring bar into the methodology. - requirements/ubiquitous-language.md: shared rigorous vocabulary as a translation-killer (Evans 2003; Fowler 2006); discovered-then-conquered not transcribed (Avanscoperta; Protean); one meaning per bounded context; genus-differentia; language and code co-evolve (Shore); curated not exhaustive. Completes the discover phase (4/4) - software-craft/external-fixtures.md: record-once-replay-forever (vcrpy record_mode=once; CI VCR_RECORD_MODE=none); kind-dispatch table as the spine (vcrpy HTTP-only; DB/queue/storage capture strategies); two scrubs (safety + determinism); capture-is-truth. Completes the explore phase (1/1) - methodology/knowledge-files.md: add two concepts -- depth deepens across the four sections (Content never thinner than Concepts; Diataxis/arc42/ Divio/Williams) and grounding/research (research the topic, cite inline; inaccurate knowledge is worse than none)

Three software-craft knowledges for the plan-phase review and authoring skills, researched against canonical sources and rethought for this flow: - test-stubs.md (99ln): PEP 484 .pyi signature files for tests, disambiguated from the Meszaros "Test Stub" test double. Documents why the .pyi-preferred rule hides drift from pyright (PEP 484/mypy/PEP 561), why stubtest is the sole detector (runtime inspect diff, structure not return-accuracy), the complete-module-surface rule, and stdlib-typing-only with types under TYPE_CHECKING. - test-design.md (85ln): observable behaviour not implementation (Meszaros); sociable tests at two grains only — integration (narrow) and E2E (broad), no solitary unit (Fowler, Vocke); one behaviour per test; spec value fidelity; property tests (Hypothesis) prove invariants (MacIver). - code-review.md (85ln): adversarial review (Fagan 1976, Tetlock), fail-fast at the first defect, report-only (minor is not a pass), two review modes (review-test-stubs = coverage/scope/happy-path vs interview; review-implementation = correctness/quality/drift/green), PASS/FAIL record per criterion. Conventions (ruff/docstrings) are CI's job, not review's. Knowledge progress: 8/18 authored (Discover 4/4, Explore 1/1, Plan 3/8).

…atalogue Completes the Plan-phase software-craft cluster (7/7). Each rethought for this flow against canonical sources: - source-stubs.md (88ln): source .pyi DERIVED from test bodies (inverse of conventional); signature-only ...; FIXED during build (escalate on gap); no prescribed layout; config 12-factor; structural artifacts keyed to the module (ORM->migration, adapter->cassette); scoped stubtest. - solid.md (55ln): the 5 SOLID principles (Martin 2000/2003) with a violation->smell->fix map; applied when a smell triggers, never speculatively. - object-calisthenics.md (60ln): Jeff Bay's 9 rules (ThoughtWorks Anthology 2008) as an exercise-turned-guideline; KT as the reference list, Content as the per-rule mechanism table; enforced as the quality bar, not a literal counter. - smell-catalogue.md (82ln): Fowler's (1999) 5 categories with Smell/Signal/Fix tables; a Comment is a symptom of code needing extraction (which the no-comments policy makes the rule). Knowledge progress: 12/18 authored (Discover 4/4, Explore 1/1, Plan software-craft 7/7). Remaining software-craft: Build (tdd, design-patterns, refactoring-techniques) + Deliver (git-conventions, versioning); plus requirements/spec-simulation (Plan).

Build-phase software-craft cluster, each rethought for this flow: - tdd.md (77ln): red/green/refactor ADAPTED — tests pre-exist, so red un-marks + confirms the right failure (ImportError=new, assertion=rework); minimum code YAGNI/KISS; refactor under green with .pyi frozen + design-only; per-contract cycle in outside-in dependency order; scoped stubtest. - design-patterns.md (64ln): GoF patterns applied only when a smell triggers (Gamma 1994, Shvets 2014); foregrounds the small set this flow's architecture hosts (Adapter at boundary, Repository, Facade/app-service, Strategy/State, Factory, value object); smell->pattern lookup. - refactoring-techniques.md (87ln): Fowler's (1999) moves organised by problem category as a reference index (drops step-by-step mechanics to skill/book); .pyi fixed under green; convention compliance is CI's job, not refactor's. Knowledge progress: 15/18 authored. Remaining software-craft: git-conventions, versioning (Deliver); plus requirements/spec-simulation (Plan).

Deliver-phase software-craft cluster; completes all 13 software-craft knowledges: - git-conventions.md (75ln): Conventional Commits form; one logical change per commit (ship-unit = one contract, .pyi unchanged); refactor separate from feature; three-branch model feature->dev->release/main; squash-merge into dev gates whole-suite + whole-suite stubtest. - versioning.md (59ln): SemVer 2.0.0 bump rules + 0.x instability; PEP 440 for Python (X.Y.Z core, +local stripped by indexes); CalVer when timing beats compatibility; pyproject version is single source of truth; publish-release picks notes / PR to main / v{version} tag. Software-craft cluster complete (13/13). Overall knowledge: 17/18 authored. Sole remaining cited knowledge: requirements/spec-simulation (Plan phase, requirements/).

Complete rewrite of spec-simulation, addressing that the prior approach created JSON I/O files for ceremony and did not catch the e2e-affecting issues (composition, cross-test coherence): - spec-simulation.md (105ln): simulation is a MENTAL EXECUTION of the contract set (test .pyi + test .py + source .pyi) — the tests ARE the spec, there is no domain_spec.md to walk. The e2e failures live in composition + cross-test coherence, which tools are blind to: a type imported from a module that does not re-export it; one value in two shapes across tests; a shared module with no test; a dependency cycle. Method: walk the e2e path hop-by-hop (type/value/side-effect tracing to a backing contract) + trace each domain value across tests. Tool floor (pyright/stubtest/no-orphans/traceability) necessary but not sufficient. Output is a judgment + named gaps — no per-walkthrough JSON, no cache. - simulate-contracts skill step 2 now DIRECTS the walkthrough per the leak principle (was a bare 'answer the question'); tool checks kept as the floor. Knowledge cluster COMPLETE: 18/18 authored (Discover 4/4, Explore 1/1, Plan 8/8, Build 3/3, Deliver 2/2).

Defining-by-negation against artifacts that do not exist in this project (referencing the backup's mechanisms a reader has never seen) is noise. - spec-simulation.md: drop 'there is no domain_spec.md', 'not prose', 'no separate prose document', and the 'old ceremony / per-walkthrough JSON / cache directory' contrast passage. Reframe KT6, the 'Judgment' concept, and the 'What gets walked' / 'The output' subsections positive. - source-stubs.md: drop 'skeleton category' / 'skeleton or ports layer' (backup hexagonal-ports phantom); state the layout point positive. Audited all 18 knowledge files: remaining negations ('no human in the loop', 'no third verdict', 'no way to traverse another aggregate', the runtime state-change 'no longer') are real constraints, not phantom contrasts — kept.

Rethink pyproject + tooling per the project's intent: ruff catches real issues early without enforcing the style/sorting/docstrings that churn on every refactor. Ruff: - Drop the 'conventions' task — the style backdoor that enforced import sorting (I), annotations (ANN), naming (N), pycodestyle (E/W), etc. The main select was never the culprit; this backdoor was. - select kept: A ASYNC B C9 DTZ ERA F G LOG PYI RUF S SIM (bug + security + simplify; all stable + website-searchable). - select dropped: FURB (refactor suggestions = rework-on-refactor), PT (half stylistic), T20 (fights legitimate CLI prints), preview=true (unstable rules aren't reliably searchable). - per-file-ignores: tests/** -> [S101, S404] (drop vestigial ANN, D); drop scripts/*.py (no such dir). Deps + tooling: - De-beehive: drop pytest-beehave + [tool.beehave] + the 'deprecated' marker. - Consolidate to one [project.optional-dependencies].dev; delete [dependency-groups].dev. Move flowr from runtime to dev; drop flowr[viz]. - Drop pdoc + ghp-import + the 3 doc tasks (blocked on package; re-add later). - requires-python >=3.13 (flowr>=1.2.1 requires it; 3.12 was the intent). - CI Python 3.14 -> 3.13 (match the floor). - release-check = lint && static-check && stubtest && test (was conventions && ... && doc-build). - Identity (name/version/description/readme/urls) kept unchanged per decision. Verified: uv sync --extra dev clean; ruff check . clean; 6 flows validate.

First real pipeline run (session: dogfood) through discover + explore. Discover (condensed interview, product-owner role): - .cache/dogfood/interview-notes.md — 4-level funnel for weather-lookup (general/cross-cutting/building-blocks/consolidation); 4 contracts emerge: Settings, WeatherAdapter (external), History (persistence), WeatherService (CLI e2e); outside-in order; zero gaps. - docs/glossary.md — 6 terms across weather-lookup + history contexts. Explore (integration-engineer role; real network probes of open-meteo): - tests/cassettes/open-meteo/geocoding.yaml — Berlin hit + Xyzqwerty miss (200 with no 'results' is the unknown-city error, branched on body not status). - tests/cassettes/open-meteo/forecast.yaml — Berlin current conditions. - Cassettes scrubbed: bodies decoded (decode_compressed_response), volatile headers stripped (date/server/cf-ray/content-encoding/...). - .cache/dogfood/{probe-target,probe-research,external-contracts}.md + .cache/explore/open-meteo/probe.py (gitignored working state). Validated: pipeline auto-enters discovery subflow; 4-state discover chain; discover->explore->plan subflow chaining fires on real exits. Session now at plan-flow/author-test-stubs.

Plan phase of the dogfood pipeline run (session: dogfood). author-test-stubs -> review -> write-test-py -> derive-source-stubs -> simulate: - tests/integration/{settings,weather,history}_test.{py,pyi} - tests/e2e/cli_test.{py,pyi} 9 tests, all @pytest.mark.pending, deferred SUT imports, skipping cleanly. - app/{__init__,settings,weather,history,cli}.pyi derived from the test bodies. - Cassette consolidated to one per service (tests/cassettes/open-meteo/open-meteo.yaml, 3 interactions) so the e2e flow replays both calls from a single file; drops the per-endpoint geocoding.yaml + forecast.yaml. Coherence fix surfaced by the simulate walkthrough: the forecast interaction was recorded with geocoded coords (52.52437/13.41053) so geocode->forecast shares one coordinate shape across the forecast test, the e2e test, and the cassette (vcr matches on query). Forecast value 16.1/6.5/weather_code 0. Gates: pyright 0 errors (9 reportMissingModuleSource = expected, no source .py yet); stubtest on test pairs Success; 6 evidence asserted at simulate -> contracts-ready. Session now at tdd-flow/select.

Frozen dataclass + from_env classmethod reading WEATHER_GEOCODING_BASE / WEATHER_FORECAST_BASE / DATABASE_URL with open-meteo defaults. Un-marks the 2 settings tests.

httpx client; geocode raises LookupError on unknown city (body has no 'results'); forecast returns Conditions. Replays tests/cassettes/open-meteo. Adds httpx as a project dependency.

Normalizes database_url (strips sqlite:/// prefix); record inserts; recent returns LookupRecords latest-first (ORDER BY id DESC).

main composes Settings.from_env + WeatherAdapter + History; geocode -> forecast -> record -> print. E2E replays the open-meteo cassette.

- Revert dogfood instance from the template root (app/, tests/, docs/glossary.md, cassettes); it lives in /tmp as a separate instance project. - New knowledge: research folder (29 source cards + card template + research-files knowledge), secrets-and-config (12-factor secrets/config split, LLM-agent threat model, dotenv_values over load_dotenv, agent instruct/ask protocol), design-patterns rework (language-agnostic 22-pattern catalog), and the journal/plan->explore escalation wiring (needs-capture exit + cross-phase journal.md). - New templates: glossary, state (living spec), README, research card, ADR, .env.example. - ADR model (docs/decisions/2026-07-02-use-pyi-first-contracts.md) + standalone record-decision skill + system-architect identity carrying the ADR-awareness. - Flow description slim (orientation kernels); input artifacts point at .pyi; skill and knowledge cross-links de-duplicated. - CI: gitleaks secret-scanner job; ruff bug-catcher select (no docstrings, D dropped). - pyproject: de-beehave, consolidated dev deps, requires-python >=3.13, httpx reverted.

…ture layers - copier instantiator: copier.yml, pyproject.toml.jinja, README.md.jinja, project-instantiator agent, instantiate-project skill - secrets-and-config: dotenv_values over load_dotenv, out-of-workspace ~/.secrets/<project>.env, agent instruct/ask protocol, .env.example template, gitleaks CI job - design domain: interaction/visual/asset/cli/api/accessibility knowledges + design-interaction, design-visual-asset skills + social-card/logo SVG templates - architecture domain: quality-attributes, context-mapping - writing domain: ai-language-markers (Kobak 2024, Jackson 2026) - docstring lifecycle: phased model (stripped at select, regenerated at merge), scripts/strip_docstrings.py, dev/merge lint split - workflow domain: flowr-operations knowledge - AGENTS.md lean rewrite: operating discipline + driving-a-state loop - research folder: writing/architecture/design/process cards - drop TODO.md and stale .templates/README.md.template

nullhack added 19 commits July 1, 2026 11:05

feat(settings): implement Settings config contract

daf8299

Frozen dataclass + from_env classmethod reading WEATHER_GEOCODING_BASE / WEATHER_FORECAST_BASE / DATABASE_URL with open-meteo defaults. Un-marks the 2 settings tests.

feat(weather): implement WeatherAdapter over open-meteo

74e5475

httpx client; geocode raises LookupError on unknown city (body has no 'results'); forecast returns Conditions. Replays tests/cassettes/open-meteo. Adds httpx as a project dependency.

feat(history): implement History persistence over sqlite3

af98dce

Normalizes database_url (strips sqlite:/// prefix); record inserts; recent returns LookupRecords latest-first (ORDER BY id DESC).

feat(cli): implement WeatherService CLI composition

39b69e1

main composes Settings.from_env + WeatherAdapter + History; geocode -> forecast -> record -> print. E2E replays the open-meteo cassette.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: clean-slate methodology template rebuild (copier + staged-contract pipeline)#187

refactor: clean-slate methodology template rebuild (copier + staged-contract pipeline)#187
nullhack wants to merge 19 commits into
mainfrom
refactor/clean-slate

nullhack commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nullhack commented Jul 3, 2026

Summary

What's in

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant