Nightly Src Projects Desk Raw Survey (2026-05-11)

This raw note preserves the public-safe basis for the 2026-05-11 nightly src/ projects desk. It summarizes inspectable repository evidence only: README/docs, manifests, branch and commit metadata, status summaries, safe filenames, mtimes, tests, plans, and visible checked-in artifacts. It does not publish secret-bearing files, local settings, raw prompts/logs/trajectories, private corpus bodies, evaluator payloads, raw benchmark outputs, checkpoint/model artifacts, biometric/capture data, generated media bodies, or sensitive/provocative material.

Where a directory is local-only, sensitive, private-corpus-backed, artifact-heavy, or too skeletal, this note uses category-level wording. The source tree is not a confessional booth; it is a substrate for evidence.

Survey scope and method

Survey root: /Users/ericfode/src.
Survey timestamp: 2026-05-11.
Full top-level directory count: 39, including hidden directories.
Execution shape: exactly 10 top-level Hermes survey lane identities, dispatched as one batch of 10.
Lane recursion: all 10 lane summaries reported delegate_task availability. Each lane spawned a three-way survey team for purpose/docs/manifests, live-work evidence, and public-safety eligibility; each subteam reported one further three-way leaf recursion. Further recursion ended at leaf checks/depth limits.
Controller audit: after the lane summaries returned, a read-only controller audit re-enumerated all 39 top-level directories and spot-checked git state/HEAD/status or non-git top-level shape. No missing directory was found.
Evidence allowed: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts.
Evidence excluded: secret contents, .env contents, hidden local settings, raw prompts/logs/trajectories, hidden evaluator/supervisor payloads, private corpus bodies, explicit/provocative/unsafe material, raw benchmark outputs, checkpoints/model artifacts, biometric/capture data, generated media bodies, and directories too skeletal for a responsible public claim.
Illustration: generated locally as symbolic SVG editorial art at queries/news-assets/2026-05-11-project-desk-hero.svg; it is not a screenshot.

Ten survey lanes

Hidden local assistant settings; hidden tinygrad research checkout; uncommitted harness workspace; one sensitive social-claim notebook held back by category.
basis; basis-hermes; basis-jcode; cardgame1 / Dungeon Steward.
Empty creative; deer-flow; privacy-sensitive FACEMUSIC; gas-city-but-its-just-codex.
gemma-dungeon; gemma4-tinygrad-opt; handterm; hoid.
is-codex-better; is-it-formal; justfooln; kettlebellsim.
skeletal Kimi settings area; local langfuse; local Hermes model runtime; scratch meta-hermes workspace.
nnpl-external-latent-bus; nnpl-shared-bus; nnpl-typed-boundary-ir; openai-symphony.
empty overengineeredlife; silly-pi-stuff; private spec-dataset-evolution-corpus; nested internal skill workspace.
steward; testing-rl; testing-rl-hermes; empty tinygrad.
tinygrad-gemma; empty tinygrad-gemma-gemini; tinygrad-gemma-kimi.

Public-safe lead candidates

Test-writing, verifier, and evaluation environments

testing-rl evidence: git repo on master, HEAD 46bbb48 from 2026-05-10, ahead of origin by 1 and dirty with modified README/docs/scripts/tests plus untracked local-corpus test material. Safe evidence includes README, pyproject CLI entries, SPEC.md, workflow docs, artifact schemas, environment contract, non-cheating writer docs, counterfactual/verifier/history docs, Lean material, adapter docs, dashboard pages, and 19 test files by filename. Safe summary: an RL-style environment for training/evaluating agents that write high-value software tests against bounded workspaces and hidden reference/replay evidence. Raw reward feeds, local/private corpus details, dashboard payloads, benchmark bodies, .hermes/.codex internals, logs/prompts/trajectories remain withheld.
testing-rl-hermes evidence: clean local git repo on main, HEAD 6cbca51 from 2026-05-02, no remote observed. Safe evidence includes MASTER_PLAN.md, pyproject console script, test-generation RL environment docs, benchmark/data strategy, verifier-training/history-fixture docs, reports by filename, source, and 3 test files. Safe summary: artifact-first prototype environment for test-generation agents with supervisor-held references/mutants and deterministic grading concepts. Raw fixture/oracle/mutant/report bodies remain withheld.
is-it-formal evidence: no-commit git repo with Lean/Lake scaffold, README.md, IsItFormal sources, JSON examples by filename, and Python grader tooling. Safe summary: small Lean/Python scaffold for classifying how formal a claim is. It lacks a license and committed history, so this remains prototype copy.

Basis, Steward, and spec-code grounding

basis evidence: Elixir/Mix git repo on main, HEAD a5544e0 from 2026-05-07, tracking origin, with an untracked reducer experiments directory. Safe evidence includes spec.md, Mix metadata, reducer and implementation-imaginer component specs, docs, and tests. Safe summary: draft Elixir/BEAM system for reducing prose/spec artifacts into structured, provenance-backed specification state.
basis-hermes evidence: clean Python/Hermes plugin repo on main, HEAD 0061d32 from 2026-05-05, with README, plugin.yaml, pyproject, dashboard manifest, reducer/validator source, CLI/tool handlers, and tests. Safe summary: Hermes-native wrapper exposing deterministic Basis reducer and packet-validator surfaces.
basis-jcode evidence: git repo on main, HEAD 4b1e621 from 2026-05-05, ahead of origin by 10 and dirty with tracked deletions in reducer examples/UI. Safe summary: category-level Jcode-native reducer/control-plane variant for ledgers, validation, worker packets, and dashboard projections. Raw .basis runs, prompts, streams, validation bodies, worker packets, run graphs, and output artifacts are withheld.
steward evidence: design-stage git repo on main, HEAD ba88837 from 2026-05-05, dirty with modified design docs and untracked service-vision/ADR/schema/query-contract material. README explicitly frames the repo as ideation/design only. Safe summary: design-stage semantic/provenance service concept over specs, code, Git history, agent work, reasoning, and verification; not an implemented product.
The private spec corpus was surveyed only as category-level evidence of a gated research corpus; raw copied artifacts and compliance/scan payloads remain private.

Gemma, tinygrad, symbolic game state, and NNPL benches

gemma-dungeon evidence: clean git repo on main, HEAD 1ebd8a8 from 2026-05-11. Safe evidence includes README, pyproject, docs/specs, schemas, tests, CLI/package surfaces, world-model/action-head/replay/policy-eval/runtime/web-viewer tests by filename. Safe summary: embedding-native, symbolically audited roguelike research workspace using explicit game state, legal-action scoring, replay/schema contracts, and Gemma/tinygrad policy experiments. Replay payloads, exports, prompt/logit artifacts, datasets, and internal plans are withheld.
tinygrad-gemma evidence: git repo on main, ahead of origin by 93, tracked tree clean with many untracked local artifacts. Safe evidence includes README, pyproject package tinygrad-gemma, CLI/chat entry points, docs/configs/benchmarks/scripts/tests, CI workflow, and recent 2026-05-06/07 worker-round commits. Safe summary: native tinygrad Gemma 4 implementation with local checkpoint loading, tokenizer and multimodal support, KV-cache generation, CLI/chat, training/checkpoint helpers, quantization surfaces, and tests. Raw checkpoints, benchmark logs, performance claims, and untracked artifact bodies are withheld.
gemma4-tinygrad-opt and tinygrad-gemma-kimi are category-level optimization sandboxes. The former lacks a top-level git repo/README; the latter is dirty on opt/attention with modified core/benchmark files and raw results/patch artifacts. Summarize them as Gemma/tinygrad optimization work only; do not publish benchmark payloads or patch-race artifacts.
nnpl-external-latent-bus evidence: non-git Python/Numpy prototype with README, project brief, pyproject, docs, source, artifacts by filename, and 52 test files. Safe summary: external/internal latent-bus architecture for option-preserving planning and bridge-dependence probes.
nnpl-typed-boundary-ir evidence: non-git Python/tinygrad prototype with README, project brief, pyproject, docs, data/readme, source, results by filename, and 37 test files. Safe summary: typed IR boundaries for validated planning artifacts, legality, auditability, deterministic rendering, and failure localization.
nnpl-shared-bus records useful negative/limited shared-bus experiment evidence but is kept category-level because run/checkpoint/trace/eval artifact categories dominate the visible surface.

Harness/control-plane and orchestration side rooms

gas-city-but-its-just-codex evidence: dirty git repo on codex/native-codex-ui, HEAD 198aefc from 2026-04-21, with README, Rust workspace, workflow-ledger specs, templates/schemas, MCP/gRPC/app-server surfaces, operator tooling, docs/scripts, tests, and Lean/formal material. Safe summary: category-level Codex-native durable workflow/control-plane research. Runtime state, transcripts, context boards, benchmark payloads, databases, workflow IDs, logs, and live operator state remain withheld.
another-harness evidence: no-commit git repo on main with hundreds of untracked entries, Lean/Lake metadata, docs, tests, tools, benchmarks, and plugins. Safe summary: early Codex/Hermes harness and Lean formalization workspace. No maturity or release claim is justified.
openai-symphony evidence: dirty Elixir/Phoenix repo on main, HEAD 58cf97d from 2026-04-27, with README/SPEC, Elixir manifest/docs, LiveView/API/dashboard/logging/token-accounting material, tests, and modified app-server/orchestrator/status files. Safe summary: engineering-preview orchestration service for issue-tracker-driven isolated coding-agent runs. Logs, workflow/prompt bodies, hidden tooling, and local runtime details are withheld.
deer-flow evidence: public LangGraph/LangChain-style agent harness checkout with backend/frontend/Docker/docs/tests, dirty local nginx config and .flox state. Safe summary: public super-agent harness checkout; local config remains private.
is-codex-better evidence: no-commit draft repo with README/docs/plugins/install scripts/state procedure material. Safe summary: category-level draft Codex/Hermes harness-extension repo; profile/session/procedure internals remain withheld.

Simulation, terminal, interface, and craft work

kettlebellsim evidence: clean git repo on codex/reward-audit-and-swing-training, ahead of origin by 36, HEAD 1d973de from 2026-05-09. Safe evidence includes package metadata, planning/gate docs, bounded Modal/Isaac wrapper acceptance docs, scripts, configs, recipes, skills, and 97 test files by filename. Safe summary: simulation-first kettlebell swing biomechanics/path-signature toolkit with local deterministic planar gates and permission-gated remote Isaac/Modal probes. Logs, trajectories, rollouts, generated media, run artifacts, checkpoints, and service/account details remain withheld.
handterm evidence: clean Rust git repo on master, HEAD 977e709 from 2026-04-19, with README, Cargo workspace, MIT license, optimization docs, CI, tests, and recent graphics/kitty-upload refactors. Safe summary: Wayland-native Rust terminal emulator focused on low-latency, resource-efficient multi-window operation.
FACEMUSIC was surveyed as a privacy-sensitive face-controlled music prototype with web/iOS/Rust/ML components. Because the domain is biometric-adjacent and the tree is dirty/untracked, only category-level mention is appropriate.
hoid was surveyed as a structured world-packet / creative world-studio prototype with active Phoenix work, but creative corpus/story/world/music/comic bodies, prompt/transcript/event data, generated media, and secret/env-bearing categories keep it category-only.
cardgame1 / Dungeon Steward has real Godot project and test/design evidence, but generated-art, prompt, model/checkpoint, ignored env/session-log, and simulation artifact surfaces keep this run at category level.

Held back from project-specific public detail

The survey fully held back, or reduced to category-only mention, hidden local settings, hidden-only or empty directories, one sensitive social-claim notebook, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator-like payloads, hidden references/oracles, benchmark raw outputs, model/checkpoint artifacts, biometric/capture data, creative story/canon drafts, service configuration, raw test/counterexample bodies, cache/build/vendor directories, and all too-skeletal placeholders.

Editorial synthesis

The publishable movement clusters around six themes:

test-generation and verifier environments are the strongest live-work signal tonight, with testing-rl moving on May 10 and testing-rl-hermes preserving the smaller prototype lineage;
specification work is spreading from Basis packets into Steward-style durable provenance services;
Gemma/tinygrad work now includes both model-runtime benches and a symbolic roguelike environment that can expose policy/action-head claims through schemas and tests;
NNPL remains useful when it preserves negative results and typed-boundary claims rather than merely promising latent magic;
orchestration repos are rich but often dirty, internal, or artifact-heavy, so public copy should emphasize architecture and withhold run state;
craft projects remain publishable when they bring ordinary proofs of life: README, license, manifests, tests, clean git state. The Cargo manifest remains a modest but dignified epistemology.

A public note can say that much. It should not say more merely because the filesystem was candid.

Agent Harness Wiki

Browse