Run bash qa/run.sh. Eleven phases sweep the codebase, the
adversarial primitives, and every public promise on the website. Pass /
fail / skip verdicts land in a stamped JSONL report — auditors and
buyers can replay it without ever talking to us.
You print "Contains 12 marshmallow shapes, made in Vermont, sugar-free" on the side of a cereal box. A year later the recipe changes; the box still says 12 even though there are now 7. A self-validating harness is the robot that walks through the actual cereal every morning, counts the shapes, taste-tests for sugar — and refuses to ship the box if any line is wrong.
ls packages/fp/assets/agents/*.md | wc -l == 7 grep "Stanfis" packages/trust-portal/src/components/Footer.astro jq '[.controls[]|select(.verified)]|length == 24' validated-controls.json Every public promise on the website is encoded in qa/claims.yml as a machine-checkable assertion against the live codebase. Change the website to claim "10 agents" but ship 7 → the build fails.
A dedicated phase tampers with audit-chain entries, forges signatures, swaps run-manifest fields, and demands the system reject loudly. Every tamper that succeeds silently fails the build.
Every command in the help screen must be wired in cli.js, and vice versa. Every CSS class used by a marketing component must have a definition. Drift between docs and code fails the build.
Unit tests, e2e tests, integration tests — every shop has those. What no CLI ships with is its own marketing-claim audit, running against itself on every commit. The result is a procurement and security artifact your buyers can run themselves: one command, a stamped report, no 6-week questionnaire cycle.
$ bash qa/run.sh
── Phase 0 Install ─ 6 passed, 0 failed
── Phase 1 F0 funda… ─ 12 passed, 0 failed
── Phase 2 F1 govern… ─ 14 passed, 0 failed
── …
── Phase 10 Docs parity ─ 7 passed, 0 failed
total: 139 passed, 0 failed, 1 skipped
Report: qa/reports/run-2026-05-04T01-38-49Z/summary.md
JSONL: qa/reports/run-2026-05-04T01-38-49Z/results.jsonl