Basis Experiment Status

Status table

ExperimentHypothesisEvidence inspectedCurrent statusNext public milestone
Core Basis reducer runtimeOvercomplete prose specs can be reduced toward structured state with provenance and explicit projection targets./Users/ericfode/src/basis, HEAD a5544e0; spec.md; reducer/imaginer specs; Elixir source/tests.Active baseline. Root is not where the recovered Reduce/Imagine spikes live.Keep core gates green; consolidate recovered work through reviewable branches.
Basis.Reduce workbenchSource-backed review UI can expose sentence evidence, projection impact, and decision actions without accepting model output as truth./Users/ericfode/.codex/worktrees/4e6b/basis, branch codex/inspect-reducer-eval, HEAD ceb8df8; generated UI assets; reducer control contract.Branch is tracked and recent UI commits are coherent; pathology study is untracked.Decide whether to commit/split/prune spec-pathology-study/; preserve control-contract gate.
Basis.Imagine workbenchPlan-space futures can be compared as proposal-only implementation choices before human acceptance./Users/ericfode/.codex/worktrees/95ae/basis, branch codex/start-imaginer; imaginer.html; future-tradeoffs.js; runtime/provider/test diffs.Active dirty spike. Mix tests and JS checks pass, but patch is large and unreviewed.Stabilize into a reviewable patch; keep Packet Schema First as default unless evidence beats it.
Basis Hermes pluginBasis reduction can be exposed as Hermes tools without provider-schema breakage./Users/ericfode/src/basis-hermes, HEAD 0061d32; plugin manifest; Python reducer; dashboard API; tests.Strongest clean integration surface. Python and JS gates pass.Add public/synthetic golden examples and stable packet contract docs.
Jcode self-convergenceModel/human worker packets can drive a resumable reducer loop with validation, convergence judgement, dashboard attention, and explicit acceptance./Users/ericfode/src/basis-jcode/components/spec-basis-reducer; HEAD 4b1e621; .basis/self-convergence inspected by counts only.Richest prior experiment, but dirty/ahead and public-sensitive. 26 tests pass.Resolve dirty state; generate public-safe run summary without packet bodies.
Spec-pathology studyReducer/imaginer usefulness should be tested by behavior, process cost, intervention controls, and competence floors.Untracked components/spec-basis-reducer/experiments/spec-pathology-study/; reward state and JSON score reports.Important but not canonical. Large wave is complete but inconclusive because clean control failed below competence floor.Add competence-floor gate; repeat only after baseline builders can pass the locked evaluator.
Smoke fixture reducerThe control plane can validate packet contracts without live model calls..basis/smoke names/counts and test suite.Useful deterministic fixture lane.Keep as regression gate; do not delete existing smoke run casually.
Spec-code grounding pressureBasis records should help connect specs to code/tests/diffs, not merely look structured.spec-dataset-evolution-research-project, spec-deep-dive-index, local steward design repo.Design-stage external pressure.Run a small public/synthetic connectedness benchmark with Basis records as features.

How it is going

The honest answer is: promising, but still pre-consolidation.

Good signs:

  • the core Basis contract is crisp about proposal versus acceptance;
  • basis-hermes is clean and test-passing;
  • Codex tool-schema compatibility was repaired and recorded;
  • basis-reduce-workbench has a tracked review/decision UI branch with a passing control-binding gate;
  • the spec-pathology study measures final behavior, process cost, reducer prediction, placebo/noisy/wrong controls, and competence-floor failure separately;
  • basis-imagine-workbench has a dedicated imaginer URL, future-tradeoff model, and app-server-backed lens/runtime hooks;
  • basis-jcode has meaningful test coverage for production packet provenance, convergence, repair, dashboard decisions, and acceptance boundaries;
  • the self-convergence run has enough artifact structure to be worth studying internally.

Friction signs:

  • core basis has a red formatter gate;
  • Basis.Run.Server is large and cross-cutting;
  • basis-jcode is ahead by ten commits and has dirty tracked deletions;
  • .basis/self-convergence is too sensitive to publish raw;
  • dashboard state is useful but local-file and run-state capable, so it needs a loopback/auth posture.

Current gates

GateSurfaceResult
mise exec -- mix testcore basispass: 4 tests, 0 failures
mise exec -- mix compile --warnings-as-errorscore basispass
mise exec -- mix format --check-formattedcore basisfail: one formatting issue
uv run --extra dev pytest -qbasis-hermespass: 15 passed
npm testbasis-hermes/components/spec-basis-reducerpass: 4 pass
npm testbasis-jcode/components/spec-basis-reducerpass: 26 pass

Experiment posture

The experiments should be judged by whether they reduce discretion, not whether they produce impressive prose.

A Basis experiment is useful when it leaves:

  • source identity and line/range provenance;
  • proposed records with witnesses;
  • validation status;
  • acceptance/rejection/defer decisions;
  • explicit unresolved questions;
  • target projections with loss/caveat notes;
  • enough event history to audit why a record exists.

A Basis experiment is not useful merely because it has a pretty dashboard or a large packet. Large packets are not wisdom. They are, at best, disciplined compost.

1. Format-and-boundary slice

Goal: make core basis locally green and extract one responsibility from Basis.Run.Server.

Acceptance:

  • format/test/compile all pass;
  • no semantic behavior expansion;
  • new tests around the extracted boundary.

2. Public-safe self-convergence summary

Goal: create a script in basis-jcode that emits safe summary JSON from .basis/self-convergence.

Allowed output:

  • run ID/hash;
  • counts by packet type;
  • counts by round;
  • validation/convergence status;
  • accepted/rejected/deferred counts;
  • caveats.

Disallowed output:

  • prompt text;
  • packet bodies;
  • NDJSON/log bodies;
  • dashboard state body;
  • source text from working copies.

3. Golden public/synthetic spec reduction

Goal: reduce one tiny synthetic spec and one explicitly public permissive spec through basis-hermes, then publish only the packet schema and summarized records.

Acceptance:

  • deterministic output hashes;
  • validation pass;
  • no local paths in publishable examples;
  • reviewer can trace each record to a source range.

4. Steward connectedness pressure

Goal: evaluate whether Basis records help a simple spec-code retrieval task in spec-dataset-evolution-research-project / Steward.

Acceptance:

  • BM25/path baseline exists;
  • Basis features are compared against the baseline;
  • no model training until baseline and hard negatives exist.

No-go criteria

Stop or redesign if:

  • accepted Basis state is indistinguishable from model output;
  • UI state becomes canonical;
  • .basis run artifacts are published raw;
  • a benchmark claim lacks a baseline;
  • semantic labels are generated by deterministic framework code instead of explicit model/human packets;
  • hidden prompts/logs/evaluator material leak into public docs.

That is the useful discipline. The project is promising because it already knows what not to make authoritative.