Skip to content

refactor: clean-slate methodology template rebuild (copier + staged-contract pipeline)#187

Open
nullhack wants to merge 19 commits into
mainfrom
refactor/clean-slate
Open

refactor: clean-slate methodology template rebuild (copier + staged-contract pipeline)#187
nullhack wants to merge 19 commits into
mainfrom
refactor/clean-slate

Conversation

@nullhack

@nullhack nullhack commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Summary

A clean-slate rebuild of temple8 as a pure methodology template (copier-based), replacing the prior beehave/BDD orchestration layer with a staged-contract pipeline wired into flowr.

What's in

  • Pipeline (flowr): discover → explore → plan → build → deliver → shipped; 6 flows; one owner per state; evidence-based gates (CI is the enforcement backstop).
  • Methodology layer: 11 agents, 21 flow-bound + 4 standalone skills, 35 knowledge files across methodology/, workflow/, requirements/, software-craft/, architecture/, writing/, design/.
  • Staged contracts: test .pyi → test .py (@pytest.mark.pending) → source .pyi → simulate; build implements source .py from the fixed .pyi one cycle at a time; mypy.stubtest is the sole .pyi/.py drift detector.
  • Secrets model: dotenv_values() into a frozen Settings (never os.environ); secrets out-of-workspace at ~/.secrets/<project>.env; agent instruct/ask-never-create protocol; gitleaks in CI.
  • Copier instantiator: copier.yml questionnaire, pyproject.toml.jinja + README.md.jinja, project-instantiator agent + instantiate-project skill; instances start empty of source/tests by design.
  • Docstring lifecycle (phased): source naked during build, stripped at select, regenerated at merge-to-dev; dev lint = bug-catchers, merge lint adds SIM/RUF + format.
  • Design + architecture + writing domains grounded in current standards (WCAG 2.2, RFC 9457, Norman, Nielsen, Bass, Evans, Vernon, Kobak 2024, Jackson 2026).
  • Research folder behind every citation (verify-never-recall).
  • CI: ruff + flowr validate always on; pyright/stubtest/pytest guarded on package+tests.

Verification

  • 6 flows validate {valid:true}
  • ruff check . clean
  • citation graph closed (35 knowledge files, zero forward-refs)
  • copier .jinja files render with zero leftover markers; conditional _exclude logic verified both ways
  • end-to-end copier copy smoke-test deferred to post-merge (template files now committed)

Notes

  • The weather-lookup dogfood was developed inside temple8 to validate the pipeline, then fully reverted (3ba8599); its commits remain in history but net to nothing.
  • Dependabot/security alerts on the default branch are pre-existing.

nullhack added 19 commits July 1, 2026 11:05
Replaces the removed spec-driven/beehave orchestration layer on the
refactor/clean-slate branch (old layer preserved in .backup/, recoverable
from origin/main).

Flow set (.flowr/flows/):
- pipeline-flow orchestrates discover -> explore -> plan -> build -> deliver -> shipped
- 5 subflows; 21 inner states; one owner and one skill per state
- staged-contract surface: test .pyi -> test .py (@pending) -> source .pyi -> simulate
- @pytest.mark.pending is the backlog signal and skip mechanism
- stubtest (mypy.stubtest) drift-checks source and test .pyi at every gate

Methodology (.opencode/):
- knowledge/methodology/: separation-of-concerns, agent-files, skill-files, knowledge-files
- agents/: 11-agent roster (5 primary owners, 4 specialist doers, 2 consult-only)
- skills/: 21 lean per-state procedures (Load step, numbered, IF-THEN/wikilink cues)
- knowledge/requirements/interview-techniques.md authored (shared funnel technique)

Tooling and CI:
- pyproject: PYI added to ruff select; D/pydocstyle dropped (no-docstrings policy);
  mypy + vcrpy/pytest-vcr dev deps; pending marker registered; stubtest task
- .github/workflows/ci.yml: ruff + flowr validate always-on; pyright/stubtest/pytest
  guarded on package+tests presence (evidence vs enforcement)
- conftest.py: pending-marker skip hook
- .templates/docs/glossary.md.template: ubiquitous-language glossary skeleton

Docs: AGENTS.md rewritten for the new lifecycle; TODO.md tracks open work.
Drop interview-techniques from interview-features and consolidate-interview
load lists — those levels do structural decomposition and synthesis, not
CIT/Laddering elicitation, so loading the shared technique bled undirected
elicitation into them. interview-techniques is now cited only by the two
elicitation levels (interview-general, interview-cross-cutting), which
direct every technique they load.

Update TODO with the missing tracks: flow design debt (plan-flow rework
gaps), pyproject+tooling cleanup, docs (deleted-but-referenced README), and
the knowledge-to-author list.
…rename

Author the DDD decomposition knowledges and retitle the funnel's third
level off the product term "feature" onto the DDD term "building block".

- Add requirements/domain-decomposition.md (aggregate-first decomposition,
  gap analysis as coverage matrix) and requirements/aggregate-boundaries.md
  (sizing/splitting aggregates); repoint citations in interview-techniques
  and the interview/review skills
- Rename interview-features -> interview-building-blocks: state, skill,
  trigger (building-blocks-done, aligned with general/cross-cutting-done),
  description ("Tactical decomposition"); drop the contradictory L2 step
- Reframe agent identities off "feature" -> "building block" across
  domain-expert, product-owner, reviewer, release-engineer
- discovery-flow 2.1.0 -> 2.2.0
- derive-source-stubs input: test .pyi -> test .py (bodies hold the domain
  types the stubs are derived from)
- Rationalize working-state paths: probe scripts explore/ -> .cache/explore/
  (consolidates gitignored working state); rewrite .gitignore lean and
  project-only; add a Project layout section to AGENTS.md
…rch bar

Author two domain knowledges and encode the depth/research authoring bar
into the methodology.

- requirements/ubiquitous-language.md: shared rigorous vocabulary as a
  translation-killer (Evans 2003; Fowler 2006); discovered-then-conquered
  not transcribed (Avanscoperta; Protean); one meaning per bounded context;
  genus-differentia; language and code co-evolve (Shore); curated not
  exhaustive. Completes the discover phase (4/4)
- software-craft/external-fixtures.md: record-once-replay-forever (vcrpy
  record_mode=once; CI VCR_RECORD_MODE=none); kind-dispatch table as the
  spine (vcrpy HTTP-only; DB/queue/storage capture strategies); two scrubs
  (safety + determinism); capture-is-truth. Completes the explore phase (1/1)
- methodology/knowledge-files.md: add two concepts -- depth deepens across
  the four sections (Content never thinner than Concepts; Diataxis/arc42/
  Divio/Williams) and grounding/research (research the topic, cite inline;
  inaccurate knowledge is worse than none)
Three software-craft knowledges for the plan-phase review and authoring
skills, researched against canonical sources and rethought for this flow:

- test-stubs.md (99ln): PEP 484 .pyi signature files for tests,
  disambiguated from the Meszaros "Test Stub" test double. Documents why
  the .pyi-preferred rule hides drift from pyright (PEP 484/mypy/PEP 561),
  why stubtest is the sole detector (runtime inspect diff, structure not
  return-accuracy), the complete-module-surface rule, and stdlib-typing-only
  with types under TYPE_CHECKING.

- test-design.md (85ln): observable behaviour not implementation (Meszaros);
  sociable tests at two grains only — integration (narrow) and E2E (broad),
  no solitary unit (Fowler, Vocke); one behaviour per test; spec value
  fidelity; property tests (Hypothesis) prove invariants (MacIver).

- code-review.md (85ln): adversarial review (Fagan 1976, Tetlock), fail-fast
  at the first defect, report-only (minor is not a pass), two review modes
  (review-test-stubs = coverage/scope/happy-path vs interview;
  review-implementation = correctness/quality/drift/green), PASS/FAIL record
  per criterion. Conventions (ruff/docstrings) are CI's job, not review's.

Knowledge progress: 8/18 authored (Discover 4/4, Explore 1/1, Plan 3/8).
…atalogue

Completes the Plan-phase software-craft cluster (7/7). Each rethought
for this flow against canonical sources:

- source-stubs.md (88ln): source .pyi DERIVED from test bodies (inverse
  of conventional); signature-only ...; FIXED during build (escalate on
  gap); no prescribed layout; config 12-factor; structural artifacts keyed
  to the module (ORM->migration, adapter->cassette); scoped stubtest.

- solid.md (55ln): the 5 SOLID principles (Martin 2000/2003) with a
  violation->smell->fix map; applied when a smell triggers, never
  speculatively.

- object-calisthenics.md (60ln): Jeff Bay's 9 rules (ThoughtWorks
  Anthology 2008) as an exercise-turned-guideline; KT as the reference
  list, Content as the per-rule mechanism table; enforced as the quality
  bar, not a literal counter.

- smell-catalogue.md (82ln): Fowler's (1999) 5 categories with
  Smell/Signal/Fix tables; a Comment is a symptom of code needing
  extraction (which the no-comments policy makes the rule).

Knowledge progress: 12/18 authored (Discover 4/4, Explore 1/1, Plan
software-craft 7/7). Remaining software-craft: Build (tdd, design-patterns,
refactoring-techniques) + Deliver (git-conventions, versioning); plus
requirements/spec-simulation (Plan).
Build-phase software-craft cluster, each rethought for this flow:

- tdd.md (77ln): red/green/refactor ADAPTED — tests pre-exist, so red
  un-marks + confirms the right failure (ImportError=new, assertion=rework);
  minimum code YAGNI/KISS; refactor under green with .pyi frozen + design-only;
  per-contract cycle in outside-in dependency order; scoped stubtest.

- design-patterns.md (64ln): GoF patterns applied only when a smell triggers
  (Gamma 1994, Shvets 2014); foregrounds the small set this flow's architecture
  hosts (Adapter at boundary, Repository, Facade/app-service, Strategy/State,
  Factory, value object); smell->pattern lookup.

- refactoring-techniques.md (87ln): Fowler's (1999) moves organised by problem
  category as a reference index (drops step-by-step mechanics to skill/book);
  .pyi fixed under green; convention compliance is CI's job, not refactor's.

Knowledge progress: 15/18 authored. Remaining software-craft: git-conventions,
versioning (Deliver); plus requirements/spec-simulation (Plan).
Deliver-phase software-craft cluster; completes all 13 software-craft
knowledges:

- git-conventions.md (75ln): Conventional Commits form; one logical change
  per commit (ship-unit = one contract, .pyi unchanged); refactor separate
  from feature; three-branch model feature->dev->release/main; squash-merge
  into dev gates whole-suite + whole-suite stubtest.

- versioning.md (59ln): SemVer 2.0.0 bump rules + 0.x instability; PEP 440
  for Python (X.Y.Z core, +local stripped by indexes); CalVer when timing
  beats compatibility; pyproject version is single source of truth;
  publish-release picks notes / PR to main / v{version} tag.

Software-craft cluster complete (13/13). Overall knowledge: 17/18 authored.
Sole remaining cited knowledge: requirements/spec-simulation (Plan phase,
requirements/).
Complete rewrite of spec-simulation, addressing that the prior approach
created JSON I/O files for ceremony and did not catch the e2e-affecting
issues (composition, cross-test coherence):

- spec-simulation.md (105ln): simulation is a MENTAL EXECUTION of the
  contract set (test .pyi + test .py + source .pyi) — the tests ARE the
  spec, there is no domain_spec.md to walk. The e2e failures live in
  composition + cross-test coherence, which tools are blind to: a type
  imported from a module that does not re-export it; one value in two
  shapes across tests; a shared module with no test; a dependency cycle.
  Method: walk the e2e path hop-by-hop (type/value/side-effect tracing to
  a backing contract) + trace each domain value across tests. Tool floor
  (pyright/stubtest/no-orphans/traceability) necessary but not sufficient.
  Output is a judgment + named gaps — no per-walkthrough JSON, no cache.

- simulate-contracts skill step 2 now DIRECTS the walkthrough per the
  leak principle (was a bare 'answer the question'); tool checks kept as
  the floor.

Knowledge cluster COMPLETE: 18/18 authored (Discover 4/4, Explore 1/1,
Plan 8/8, Build 3/3, Deliver 2/2).
Defining-by-negation against artifacts that do not exist in this project
(referencing the backup's mechanisms a reader has never seen) is noise.

- spec-simulation.md: drop 'there is no domain_spec.md', 'not prose',
  'no separate prose document', and the 'old ceremony / per-walkthrough
  JSON / cache directory' contrast passage. Reframe KT6, the 'Judgment'
  concept, and the 'What gets walked' / 'The output' subsections positive.
- source-stubs.md: drop 'skeleton category' / 'skeleton or ports layer'
  (backup hexagonal-ports phantom); state the layout point positive.

Audited all 18 knowledge files: remaining negations ('no human in the
loop', 'no third verdict', 'no way to traverse another aggregate', the
runtime state-change 'no longer') are real constraints, not phantom
contrasts — kept.
Rethink pyproject + tooling per the project's intent: ruff catches real
issues early without enforcing the style/sorting/docstrings that churn on
every refactor.

Ruff:
- Drop the 'conventions' task — the style backdoor that enforced import
  sorting (I), annotations (ANN), naming (N), pycodestyle (E/W), etc. The
  main select was never the culprit; this backdoor was.
- select kept: A ASYNC B C9 DTZ ERA F G LOG PYI RUF S SIM (bug + security +
  simplify; all stable + website-searchable).
- select dropped: FURB (refactor suggestions = rework-on-refactor), PT
  (half stylistic), T20 (fights legitimate CLI prints), preview=true
  (unstable rules aren't reliably searchable).
- per-file-ignores: tests/** -> [S101, S404] (drop vestigial ANN, D);
  drop scripts/*.py (no such dir).

Deps + tooling:
- De-beehive: drop pytest-beehave + [tool.beehave] + the 'deprecated' marker.
- Consolidate to one [project.optional-dependencies].dev; delete
  [dependency-groups].dev. Move flowr from runtime to dev; drop flowr[viz].
- Drop pdoc + ghp-import + the 3 doc tasks (blocked on package; re-add later).
- requires-python >=3.13 (flowr>=1.2.1 requires it; 3.12 was the intent).
- CI Python 3.14 -> 3.13 (match the floor).
- release-check = lint && static-check && stubtest && test (was conventions
  && ... && doc-build).
- Identity (name/version/description/readme/urls) kept unchanged per decision.

Verified: uv sync --extra dev clean; ruff check . clean; 6 flows validate.
First real pipeline run (session: dogfood) through discover + explore.

Discover (condensed interview, product-owner role):
- .cache/dogfood/interview-notes.md — 4-level funnel for weather-lookup
  (general/cross-cutting/building-blocks/consolidation); 4 contracts emerge:
  Settings, WeatherAdapter (external), History (persistence), WeatherService
  (CLI e2e); outside-in order; zero gaps.
- docs/glossary.md — 6 terms across weather-lookup + history contexts.

Explore (integration-engineer role; real network probes of open-meteo):
- tests/cassettes/open-meteo/geocoding.yaml — Berlin hit + Xyzqwerty miss
  (200 with no 'results' is the unknown-city error, branched on body not status).
- tests/cassettes/open-meteo/forecast.yaml — Berlin current conditions.
- Cassettes scrubbed: bodies decoded (decode_compressed_response), volatile
  headers stripped (date/server/cf-ray/content-encoding/...).
- .cache/dogfood/{probe-target,probe-research,external-contracts}.md +
  .cache/explore/open-meteo/probe.py (gitignored working state).

Validated: pipeline auto-enters discovery subflow; 4-state discover chain;
discover->explore->plan subflow chaining fires on real exits. Session now at
plan-flow/author-test-stubs.
Plan phase of the dogfood pipeline run (session: dogfood).

author-test-stubs -> review -> write-test-py -> derive-source-stubs -> simulate:
- tests/integration/{settings,weather,history}_test.{py,pyi}
- tests/e2e/cli_test.{py,pyi}
  9 tests, all @pytest.mark.pending, deferred SUT imports, skipping cleanly.
- app/{__init__,settings,weather,history,cli}.pyi derived from the test bodies.
- Cassette consolidated to one per service (tests/cassettes/open-meteo/open-meteo.yaml,
  3 interactions) so the e2e flow replays both calls from a single file; drops the
  per-endpoint geocoding.yaml + forecast.yaml.

Coherence fix surfaced by the simulate walkthrough: the forecast interaction was
recorded with geocoded coords (52.52437/13.41053) so geocode->forecast shares one
coordinate shape across the forecast test, the e2e test, and the cassette (vcr
matches on query). Forecast value 16.1/6.5/weather_code 0.

Gates: pyright 0 errors (9 reportMissingModuleSource = expected, no source .py
yet); stubtest on test pairs Success; 6 evidence asserted at simulate -> contracts-ready.
Session now at tdd-flow/select.
Frozen dataclass + from_env classmethod reading WEATHER_GEOCODING_BASE / WEATHER_FORECAST_BASE / DATABASE_URL with open-meteo defaults. Un-marks the 2 settings tests.
httpx client; geocode raises LookupError on unknown city (body has no 'results'); forecast returns Conditions. Replays tests/cassettes/open-meteo. Adds httpx as a project dependency.
Normalizes database_url (strips sqlite:/// prefix); record inserts; recent returns LookupRecords latest-first (ORDER BY id DESC).
main composes Settings.from_env + WeatherAdapter + History; geocode -> forecast -> record -> print. E2E replays the open-meteo cassette.
- Revert dogfood instance from the template root (app/, tests/, docs/glossary.md,
  cassettes); it lives in /tmp as a separate instance project.
- New knowledge: research folder (29 source cards + card template + research-files
  knowledge), secrets-and-config (12-factor secrets/config split, LLM-agent threat
  model, dotenv_values over load_dotenv, agent instruct/ask protocol), design-patterns
  rework (language-agnostic 22-pattern catalog), and the journal/plan->explore
  escalation wiring (needs-capture exit + cross-phase journal.md).
- New templates: glossary, state (living spec), README, research card, ADR, .env.example.
- ADR model (docs/decisions/2026-07-02-use-pyi-first-contracts.md) + standalone
  record-decision skill + system-architect identity carrying the ADR-awareness.
- Flow description slim (orientation kernels); input artifacts point at .pyi; skill
  and knowledge cross-links de-duplicated.
- CI: gitleaks secret-scanner job; ruff bug-catcher select (no docstrings, D dropped).
- pyproject: de-beehave, consolidated dev deps, requires-python >=3.13, httpx reverted.
…ture layers

- copier instantiator: copier.yml, pyproject.toml.jinja, README.md.jinja,
  project-instantiator agent, instantiate-project skill
- secrets-and-config: dotenv_values over load_dotenv, out-of-workspace
  ~/.secrets/<project>.env, agent instruct/ask protocol, .env.example template,
  gitleaks CI job
- design domain: interaction/visual/asset/cli/api/accessibility knowledges +
  design-interaction, design-visual-asset skills + social-card/logo SVG templates
- architecture domain: quality-attributes, context-mapping
- writing domain: ai-language-markers (Kobak 2024, Jackson 2026)
- docstring lifecycle: phased model (stripped at select, regenerated at merge),
  scripts/strip_docstrings.py, dev/merge lint split
- workflow domain: flowr-operations knowledge
- AGENTS.md lean rewrite: operating discipline + driving-a-state loop
- research folder: writing/architecture/design/process cards
- drop TODO.md and stale .templates/README.md.template
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant