Self-validating harness

A CLI that ships with its own marketing-claim audit

Run bash qa/run.sh. Eleven phases sweep the codebase, the adversarial primitives, and every public promise on the website. Pass / fail / skip verdicts land in a stamped JSONL report — auditors and buyers can replay it without ever talking to us.

ELI5 — the cereal box that checks itself

You print "Contains 12 marshmallow shapes, made in Vermont, sugar-free" on the side of a cereal box. A year later the recipe changes; the box still says 12 even though there are now 7. A self-validating harness is the robot that walks through the actual cereal every morning, counts the shapes, taste-tests for sugar — and refuses to ship the box if any line is wrong.

Three claims, three checks

Count claim
"7 bundled agents"
ls packages/fp/assets/agents/*.md | wc -l == 7
Presence claim
"Stanfis attribution on the trust portal footer"
grep "Stanfis" packages/trust-portal/src/components/Footer.astro
Numeric claim
"24 of 48 framework cells validated"
jq '[.controls[]|select(.verified)]|length == 24' validated-controls.json

What each run sweeps

Marketing-claim audit

Every public promise on the website is encoded in qa/claims.yml as a machine-checkable assertion against the live codebase. Change the website to claim "10 agents" but ship 7 → the build fails.

Adversarial bypass

A dedicated phase tampers with audit-chain entries, forges signatures, swaps run-manifest fields, and demands the system reject loudly. Every tamper that succeeds silently fails the build.

Documentation parity

Every command in the help screen must be wired in cli.js, and vice versa. Every CSS class used by a marketing component must have a definition. Drift between docs and code fails the build.

Why nobody else has shipped this

Unit tests, e2e tests, integration tests — every shop has those. What no CLI ships with is its own marketing-claim audit, running against itself on every commit. The result is a procurement and security artifact your buyers can run themselves: one command, a stamped report, no 6-week questionnaire cycle.

Run it yourself
$ bash qa/run.sh
  ── Phase 0  Install   ─ 6 passed, 0 failed
  ── Phase 1  F0 funda… ─ 12 passed, 0 failed
  ── Phase 2  F1 govern… ─ 14 passed, 0 failed
  ── …
  ── Phase 10 Docs parity ─ 7 passed, 0 failed
                         total: 139 passed, 0 failed, 1 skipped
  Report:  qa/reports/run-2026-05-04T01-38-49Z/summary.md
  JSONL:   qa/reports/run-2026-05-04T01-38-49Z/results.jsonl
Where it lives: qa/run.sh + qa/claims.yml + 11 phase scripts under qa/phases/
Current verdict: 139 passed, 0 failed, 1 skipped (intentional)