refactor: clean-slate methodology template rebuild (copier + staged-contract pipeline)#187
Open
nullhack wants to merge 19 commits into
Open
refactor: clean-slate methodology template rebuild (copier + staged-contract pipeline)#187nullhack wants to merge 19 commits into
nullhack wants to merge 19 commits into
Conversation
Replaces the removed spec-driven/beehave orchestration layer on the refactor/clean-slate branch (old layer preserved in .backup/, recoverable from origin/main). Flow set (.flowr/flows/): - pipeline-flow orchestrates discover -> explore -> plan -> build -> deliver -> shipped - 5 subflows; 21 inner states; one owner and one skill per state - staged-contract surface: test .pyi -> test .py (@pending) -> source .pyi -> simulate - @pytest.mark.pending is the backlog signal and skip mechanism - stubtest (mypy.stubtest) drift-checks source and test .pyi at every gate Methodology (.opencode/): - knowledge/methodology/: separation-of-concerns, agent-files, skill-files, knowledge-files - agents/: 11-agent roster (5 primary owners, 4 specialist doers, 2 consult-only) - skills/: 21 lean per-state procedures (Load step, numbered, IF-THEN/wikilink cues) - knowledge/requirements/interview-techniques.md authored (shared funnel technique) Tooling and CI: - pyproject: PYI added to ruff select; D/pydocstyle dropped (no-docstrings policy); mypy + vcrpy/pytest-vcr dev deps; pending marker registered; stubtest task - .github/workflows/ci.yml: ruff + flowr validate always-on; pyright/stubtest/pytest guarded on package+tests presence (evidence vs enforcement) - conftest.py: pending-marker skip hook - .templates/docs/glossary.md.template: ubiquitous-language glossary skeleton Docs: AGENTS.md rewritten for the new lifecycle; TODO.md tracks open work.
Drop interview-techniques from interview-features and consolidate-interview load lists — those levels do structural decomposition and synthesis, not CIT/Laddering elicitation, so loading the shared technique bled undirected elicitation into them. interview-techniques is now cited only by the two elicitation levels (interview-general, interview-cross-cutting), which direct every technique they load. Update TODO with the missing tracks: flow design debt (plan-flow rework gaps), pyproject+tooling cleanup, docs (deleted-but-referenced README), and the knowledge-to-author list.
…rename
Author the DDD decomposition knowledges and retitle the funnel's third
level off the product term "feature" onto the DDD term "building block".
- Add requirements/domain-decomposition.md (aggregate-first decomposition,
gap analysis as coverage matrix) and requirements/aggregate-boundaries.md
(sizing/splitting aggregates); repoint citations in interview-techniques
and the interview/review skills
- Rename interview-features -> interview-building-blocks: state, skill,
trigger (building-blocks-done, aligned with general/cross-cutting-done),
description ("Tactical decomposition"); drop the contradictory L2 step
- Reframe agent identities off "feature" -> "building block" across
domain-expert, product-owner, reviewer, release-engineer
- discovery-flow 2.1.0 -> 2.2.0
- derive-source-stubs input: test .pyi -> test .py (bodies hold the domain
types the stubs are derived from)
- Rationalize working-state paths: probe scripts explore/ -> .cache/explore/
(consolidates gitignored working state); rewrite .gitignore lean and
project-only; add a Project layout section to AGENTS.md
…rch bar Author two domain knowledges and encode the depth/research authoring bar into the methodology. - requirements/ubiquitous-language.md: shared rigorous vocabulary as a translation-killer (Evans 2003; Fowler 2006); discovered-then-conquered not transcribed (Avanscoperta; Protean); one meaning per bounded context; genus-differentia; language and code co-evolve (Shore); curated not exhaustive. Completes the discover phase (4/4) - software-craft/external-fixtures.md: record-once-replay-forever (vcrpy record_mode=once; CI VCR_RECORD_MODE=none); kind-dispatch table as the spine (vcrpy HTTP-only; DB/queue/storage capture strategies); two scrubs (safety + determinism); capture-is-truth. Completes the explore phase (1/1) - methodology/knowledge-files.md: add two concepts -- depth deepens across the four sections (Content never thinner than Concepts; Diataxis/arc42/ Divio/Williams) and grounding/research (research the topic, cite inline; inaccurate knowledge is worse than none)
Three software-craft knowledges for the plan-phase review and authoring skills, researched against canonical sources and rethought for this flow: - test-stubs.md (99ln): PEP 484 .pyi signature files for tests, disambiguated from the Meszaros "Test Stub" test double. Documents why the .pyi-preferred rule hides drift from pyright (PEP 484/mypy/PEP 561), why stubtest is the sole detector (runtime inspect diff, structure not return-accuracy), the complete-module-surface rule, and stdlib-typing-only with types under TYPE_CHECKING. - test-design.md (85ln): observable behaviour not implementation (Meszaros); sociable tests at two grains only — integration (narrow) and E2E (broad), no solitary unit (Fowler, Vocke); one behaviour per test; spec value fidelity; property tests (Hypothesis) prove invariants (MacIver). - code-review.md (85ln): adversarial review (Fagan 1976, Tetlock), fail-fast at the first defect, report-only (minor is not a pass), two review modes (review-test-stubs = coverage/scope/happy-path vs interview; review-implementation = correctness/quality/drift/green), PASS/FAIL record per criterion. Conventions (ruff/docstrings) are CI's job, not review's. Knowledge progress: 8/18 authored (Discover 4/4, Explore 1/1, Plan 3/8).
…atalogue Completes the Plan-phase software-craft cluster (7/7). Each rethought for this flow against canonical sources: - source-stubs.md (88ln): source .pyi DERIVED from test bodies (inverse of conventional); signature-only ...; FIXED during build (escalate on gap); no prescribed layout; config 12-factor; structural artifacts keyed to the module (ORM->migration, adapter->cassette); scoped stubtest. - solid.md (55ln): the 5 SOLID principles (Martin 2000/2003) with a violation->smell->fix map; applied when a smell triggers, never speculatively. - object-calisthenics.md (60ln): Jeff Bay's 9 rules (ThoughtWorks Anthology 2008) as an exercise-turned-guideline; KT as the reference list, Content as the per-rule mechanism table; enforced as the quality bar, not a literal counter. - smell-catalogue.md (82ln): Fowler's (1999) 5 categories with Smell/Signal/Fix tables; a Comment is a symptom of code needing extraction (which the no-comments policy makes the rule). Knowledge progress: 12/18 authored (Discover 4/4, Explore 1/1, Plan software-craft 7/7). Remaining software-craft: Build (tdd, design-patterns, refactoring-techniques) + Deliver (git-conventions, versioning); plus requirements/spec-simulation (Plan).
Build-phase software-craft cluster, each rethought for this flow: - tdd.md (77ln): red/green/refactor ADAPTED — tests pre-exist, so red un-marks + confirms the right failure (ImportError=new, assertion=rework); minimum code YAGNI/KISS; refactor under green with .pyi frozen + design-only; per-contract cycle in outside-in dependency order; scoped stubtest. - design-patterns.md (64ln): GoF patterns applied only when a smell triggers (Gamma 1994, Shvets 2014); foregrounds the small set this flow's architecture hosts (Adapter at boundary, Repository, Facade/app-service, Strategy/State, Factory, value object); smell->pattern lookup. - refactoring-techniques.md (87ln): Fowler's (1999) moves organised by problem category as a reference index (drops step-by-step mechanics to skill/book); .pyi fixed under green; convention compliance is CI's job, not refactor's. Knowledge progress: 15/18 authored. Remaining software-craft: git-conventions, versioning (Deliver); plus requirements/spec-simulation (Plan).
Deliver-phase software-craft cluster; completes all 13 software-craft
knowledges:
- git-conventions.md (75ln): Conventional Commits form; one logical change
per commit (ship-unit = one contract, .pyi unchanged); refactor separate
from feature; three-branch model feature->dev->release/main; squash-merge
into dev gates whole-suite + whole-suite stubtest.
- versioning.md (59ln): SemVer 2.0.0 bump rules + 0.x instability; PEP 440
for Python (X.Y.Z core, +local stripped by indexes); CalVer when timing
beats compatibility; pyproject version is single source of truth;
publish-release picks notes / PR to main / v{version} tag.
Software-craft cluster complete (13/13). Overall knowledge: 17/18 authored.
Sole remaining cited knowledge: requirements/spec-simulation (Plan phase,
requirements/).
Complete rewrite of spec-simulation, addressing that the prior approach created JSON I/O files for ceremony and did not catch the e2e-affecting issues (composition, cross-test coherence): - spec-simulation.md (105ln): simulation is a MENTAL EXECUTION of the contract set (test .pyi + test .py + source .pyi) — the tests ARE the spec, there is no domain_spec.md to walk. The e2e failures live in composition + cross-test coherence, which tools are blind to: a type imported from a module that does not re-export it; one value in two shapes across tests; a shared module with no test; a dependency cycle. Method: walk the e2e path hop-by-hop (type/value/side-effect tracing to a backing contract) + trace each domain value across tests. Tool floor (pyright/stubtest/no-orphans/traceability) necessary but not sufficient. Output is a judgment + named gaps — no per-walkthrough JSON, no cache. - simulate-contracts skill step 2 now DIRECTS the walkthrough per the leak principle (was a bare 'answer the question'); tool checks kept as the floor. Knowledge cluster COMPLETE: 18/18 authored (Discover 4/4, Explore 1/1, Plan 8/8, Build 3/3, Deliver 2/2).
Defining-by-negation against artifacts that do not exist in this project
(referencing the backup's mechanisms a reader has never seen) is noise.
- spec-simulation.md: drop 'there is no domain_spec.md', 'not prose',
'no separate prose document', and the 'old ceremony / per-walkthrough
JSON / cache directory' contrast passage. Reframe KT6, the 'Judgment'
concept, and the 'What gets walked' / 'The output' subsections positive.
- source-stubs.md: drop 'skeleton category' / 'skeleton or ports layer'
(backup hexagonal-ports phantom); state the layout point positive.
Audited all 18 knowledge files: remaining negations ('no human in the
loop', 'no third verdict', 'no way to traverse another aggregate', the
runtime state-change 'no longer') are real constraints, not phantom
contrasts — kept.
Rethink pyproject + tooling per the project's intent: ruff catches real issues early without enforcing the style/sorting/docstrings that churn on every refactor. Ruff: - Drop the 'conventions' task — the style backdoor that enforced import sorting (I), annotations (ANN), naming (N), pycodestyle (E/W), etc. The main select was never the culprit; this backdoor was. - select kept: A ASYNC B C9 DTZ ERA F G LOG PYI RUF S SIM (bug + security + simplify; all stable + website-searchable). - select dropped: FURB (refactor suggestions = rework-on-refactor), PT (half stylistic), T20 (fights legitimate CLI prints), preview=true (unstable rules aren't reliably searchable). - per-file-ignores: tests/** -> [S101, S404] (drop vestigial ANN, D); drop scripts/*.py (no such dir). Deps + tooling: - De-beehive: drop pytest-beehave + [tool.beehave] + the 'deprecated' marker. - Consolidate to one [project.optional-dependencies].dev; delete [dependency-groups].dev. Move flowr from runtime to dev; drop flowr[viz]. - Drop pdoc + ghp-import + the 3 doc tasks (blocked on package; re-add later). - requires-python >=3.13 (flowr>=1.2.1 requires it; 3.12 was the intent). - CI Python 3.14 -> 3.13 (match the floor). - release-check = lint && static-check && stubtest && test (was conventions && ... && doc-build). - Identity (name/version/description/readme/urls) kept unchanged per decision. Verified: uv sync --extra dev clean; ruff check . clean; 6 flows validate.
First real pipeline run (session: dogfood) through discover + explore.
Discover (condensed interview, product-owner role):
- .cache/dogfood/interview-notes.md — 4-level funnel for weather-lookup
(general/cross-cutting/building-blocks/consolidation); 4 contracts emerge:
Settings, WeatherAdapter (external), History (persistence), WeatherService
(CLI e2e); outside-in order; zero gaps.
- docs/glossary.md — 6 terms across weather-lookup + history contexts.
Explore (integration-engineer role; real network probes of open-meteo):
- tests/cassettes/open-meteo/geocoding.yaml — Berlin hit + Xyzqwerty miss
(200 with no 'results' is the unknown-city error, branched on body not status).
- tests/cassettes/open-meteo/forecast.yaml — Berlin current conditions.
- Cassettes scrubbed: bodies decoded (decode_compressed_response), volatile
headers stripped (date/server/cf-ray/content-encoding/...).
- .cache/dogfood/{probe-target,probe-research,external-contracts}.md +
.cache/explore/open-meteo/probe.py (gitignored working state).
Validated: pipeline auto-enters discovery subflow; 4-state discover chain;
discover->explore->plan subflow chaining fires on real exits. Session now at
plan-flow/author-test-stubs.
Plan phase of the dogfood pipeline run (session: dogfood).
author-test-stubs -> review -> write-test-py -> derive-source-stubs -> simulate:
- tests/integration/{settings,weather,history}_test.{py,pyi}
- tests/e2e/cli_test.{py,pyi}
9 tests, all @pytest.mark.pending, deferred SUT imports, skipping cleanly.
- app/{__init__,settings,weather,history,cli}.pyi derived from the test bodies.
- Cassette consolidated to one per service (tests/cassettes/open-meteo/open-meteo.yaml,
3 interactions) so the e2e flow replays both calls from a single file; drops the
per-endpoint geocoding.yaml + forecast.yaml.
Coherence fix surfaced by the simulate walkthrough: the forecast interaction was
recorded with geocoded coords (52.52437/13.41053) so geocode->forecast shares one
coordinate shape across the forecast test, the e2e test, and the cassette (vcr
matches on query). Forecast value 16.1/6.5/weather_code 0.
Gates: pyright 0 errors (9 reportMissingModuleSource = expected, no source .py
yet); stubtest on test pairs Success; 6 evidence asserted at simulate -> contracts-ready.
Session now at tdd-flow/select.
Frozen dataclass + from_env classmethod reading WEATHER_GEOCODING_BASE / WEATHER_FORECAST_BASE / DATABASE_URL with open-meteo defaults. Un-marks the 2 settings tests.
httpx client; geocode raises LookupError on unknown city (body has no 'results'); forecast returns Conditions. Replays tests/cassettes/open-meteo. Adds httpx as a project dependency.
Normalizes database_url (strips sqlite:/// prefix); record inserts; recent returns LookupRecords latest-first (ORDER BY id DESC).
main composes Settings.from_env + WeatherAdapter + History; geocode -> forecast -> record -> print. E2E replays the open-meteo cassette.
- Revert dogfood instance from the template root (app/, tests/, docs/glossary.md, cassettes); it lives in /tmp as a separate instance project. - New knowledge: research folder (29 source cards + card template + research-files knowledge), secrets-and-config (12-factor secrets/config split, LLM-agent threat model, dotenv_values over load_dotenv, agent instruct/ask protocol), design-patterns rework (language-agnostic 22-pattern catalog), and the journal/plan->explore escalation wiring (needs-capture exit + cross-phase journal.md). - New templates: glossary, state (living spec), README, research card, ADR, .env.example. - ADR model (docs/decisions/2026-07-02-use-pyi-first-contracts.md) + standalone record-decision skill + system-architect identity carrying the ADR-awareness. - Flow description slim (orientation kernels); input artifacts point at .pyi; skill and knowledge cross-links de-duplicated. - CI: gitleaks secret-scanner job; ruff bug-catcher select (no docstrings, D dropped). - pyproject: de-beehave, consolidated dev deps, requires-python >=3.13, httpx reverted.
…ture layers - copier instantiator: copier.yml, pyproject.toml.jinja, README.md.jinja, project-instantiator agent, instantiate-project skill - secrets-and-config: dotenv_values over load_dotenv, out-of-workspace ~/.secrets/<project>.env, agent instruct/ask protocol, .env.example template, gitleaks CI job - design domain: interaction/visual/asset/cli/api/accessibility knowledges + design-interaction, design-visual-asset skills + social-card/logo SVG templates - architecture domain: quality-attributes, context-mapping - writing domain: ai-language-markers (Kobak 2024, Jackson 2026) - docstring lifecycle: phased model (stripped at select, regenerated at merge), scripts/strip_docstrings.py, dev/merge lint split - workflow domain: flowr-operations knowledge - AGENTS.md lean rewrite: operating discipline + driving-a-state loop - research folder: writing/architecture/design/process cards - drop TODO.md and stale .templates/README.md.template
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A clean-slate rebuild of temple8 as a pure methodology template (copier-based), replacing the prior beehave/BDD orchestration layer with a staged-contract pipeline wired into flowr.
What's in
methodology/,workflow/,requirements/,software-craft/,architecture/,writing/,design/..pyi→ test.py(@pytest.mark.pending) → source.pyi→ simulate; build implements source.pyfrom the fixed.pyione cycle at a time;mypy.stubtestis the sole.pyi/.pydrift detector.dotenv_values()into a frozenSettings(neveros.environ); secrets out-of-workspace at~/.secrets/<project>.env; agent instruct/ask-never-create protocol; gitleaks in CI.copier.ymlquestionnaire,pyproject.toml.jinja+README.md.jinja,project-instantiatoragent +instantiate-projectskill; instances start empty of source/tests by design.select, regenerated atmerge-to-dev; dev lint = bug-catchers, merge lint adds SIM/RUF + format.Verification
{valid:true}ruff check .clean.jinjafiles render with zero leftover markers; conditional_excludelogic verified both wayscopier copysmoke-test deferred to post-merge (template files now committed)Notes
3ba8599); its commits remain in history but net to nothing.