Nightly Src Projects Desk Raw Survey (2026-05-13)

This raw note preserves the public-safe basis for the 2026-05-13 nightly src/ projects desk. It summarizes inspectable repository evidence only: README/docs, manifests, branch and commit metadata, status summaries, safe filenames, mtimes, tests, plans, and visible checked-in artifacts. It does not publish secret-bearing files, local settings, raw prompts/logs/trajectories, private corpus bodies, evaluator payloads, raw benchmark outputs, checkpoint/model artifacts, biometric/capture data, generated media bodies, or sensitive/provocative material.

Where a directory is local-only, sensitive, private-corpus-backed, artifact-heavy, or too skeletal, this note uses category-level wording. A filesystem can be frank without being fit for publication.

Survey scope and method

Survey root: /Users/ericfode/src.
Survey timestamp: 2026-05-13.
Full top-level directory count: 41, including hidden directories.
Execution shape: exactly 10 top-level Hermes survey lane identities, dispatched as one batch of 10.
Lane coverage audit: the controller re-enumerated the 41 top-level directories and confirmed 41 unique assignments, no missing directories, no extras, and no duplicates.
Lane recursion: all 10 top-level lanes reported delegate_task availability and spawned three read-only subteams for purpose/docs/manifests, live-work evidence, and public-safety eligibility. Subteams generally recursed once more into three read-only leaf teams. Two minor leaf-shape exceptions were recorded honestly: one safety subteam inspected a fourth directory directly after three leaf probes, and one docs/manifests subteam reported a three-pass split rather than a perfectly named three-leaf fanout. No top-level fallback lanes were added.
Evidence allowed: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts.
Evidence excluded: secret contents, .env contents, hidden local settings, raw prompts/logs/trajectories, hidden evaluator/supervisor payloads, private corpus bodies, explicit/provocative/unsafe material, raw benchmark outputs, checkpoints/model artifacts, biometric/capture data, generated media bodies, and directories too skeletal for a responsible public claim.
Illustration: generated locally as symbolic SVG editorial art at queries/news-assets/2026-05-13-project-desk-hero.svg; it is not a screenshot.

Ten survey lanes

Hidden local assistant settings; internal security-scan artifacts; hidden tinygrad research checkout; privacy-sensitive face/music prototype; early uncommitted harness workspace.
One sensitive social-claim notebook held back by category; basis; basis-hermes; basis-jcode.
cardgame1 / Dungeon Steward; empty creative; deer-flow; gas-city-but-its-just-codex.
gemma-dungeon; gemma4-tinygrad-opt; handterm; hoid.
is-codex-better; is-it-formal; justfooln; kettlebellsim.
skeletal Kimi settings area; local langfuse; local Hermes model runtime; scratch meta-hermes workspace.
nnpl-external-latent-bus; nnpl-shared-bus; nnpl-typed-boundary-ir; openai-symphony.
empty overengineeredlife; silly-pi-stuff; private spec-dataset-evolution-corpus; nested skeletal/internal src workspace.
steward; testing-rl; testing-rl-hermes; textual-world-model.
empty tinygrad; tinygrad-gemma; empty tinygrad-gemma-gemini; tinygrad-gemma-kimi.

Public-safe lead candidates

Test-writing, verifier, and evaluation environments

testing-rl evidence: clean git repo on master, tracking origin and ahead by 3 commits. Latest safe commit observed: 2026-05-11 feat: require ranking lift for local H5 evidence. Safe evidence includes README, SPEC.md, spec.md, pyproject, workflow docs, many docs around rewards/verifiers/dashboards/adapters, Lean formal package material, scripts, and 24 test files by controller count. Safe summary: artifact-first RL environment for training/evaluating agents that write valuable software tests while keeping writer-visible state separate from evaluator-held references. Hidden references, local corpus bodies, raw dashboards, scorer payloads, prompts, and trajectories remain withheld.
testing-rl-hermes evidence: clean git repo on main, with recent May 1-2 commits for history-derived fixtures, inverse-fix mutants, and test-generation RL environment material. Safe evidence includes master plan, RL environment shape docs, pyproject, uv lock, source, reports by filename, benchmark/readme material, and tests. Safe summary: smaller companion prototype packaging repository-history snapshots into bounded test-writing episodes. Supervisor-held fixture/oracle/mutant bodies remain withheld.
is-it-formal evidence: unborn git repo with Lean/Lake scaffold, README, Lean semantic modules, example corpus by filename, and Python grading tool. Safe summary: small Lean/Python scaffold for classifying the formality/grounding of claims. It is coherent but local/uncommitted.

Basis, Steward, and spec-code grounding

basis evidence: Elixir/Mix git repo on main, tracking origin; latest safe commit 2026-05-07 Split reducer UI into separate app entrypoints; untracked generated reducer experiment directory. Safe evidence includes spec.md, SPEC.md, Mix metadata, reducer and implementation-imaginer component specs, docs, scripts, and tests. Safe summary: draft Elixir/BEAM system for reducing messy prose/spec artifacts into structured, provenance-backed specification state.
basis-hermes evidence: clean Python/Hermes plugin repo on main; latest safe commit 2026-05-05 fix: make basis tool schemas codex-compatible; README, plugin.yaml, pyproject, dashboard manifest, reducer/validator source, CLI/tool handlers, and tests. Safe summary: Hermes-native wrapper exposing deterministic Basis reducer and packet-validator surfaces.
basis-jcode evidence: git repo on main, ahead of origin by 10 and dirty with tracked deletions in reducer examples/UI; recent safe commits from 2026-05-05 around reducer dashboard and durable convergence. Safe summary: category-level Jcode-native reducer/control-plane variant for ledgers, validation, worker packets, and dashboard projections. Raw .basis runs, prompts, streams, validation bodies, worker packets, run graphs, and output artifacts are withheld.
steward evidence: design/prototype git repo on main, dirty with modified docs and many untracked service-support files; latest safe commit 2026-05-05 docs: seed steward design. Safe evidence includes README, architecture/decision/implementation docs, Mix/Ecto/Postgres material, schema/query-contract docs, migrations, and ExUnit tests by filename. Safe summary: early provenance-service kernel for connecting specs, code, tests, reasoning, agent runs, verification, and Git history. It is not production-ready.
The private spec corpus was surveyed only as category-level evidence of a gated research corpus; raw copied artifacts and compliance/report payloads remain private.

Gemma, tinygrad, symbolic game state, and NNPL benches

tinygrad-gemma evidence: git repo on main, ahead of origin by 93, no tracked modifications but many untracked local artifacts. Latest safe commit observed: 2026-05-07 Record epoch3 worker round seven stop. Safe evidence includes README, pyproject package metadata, CLI/chat entry points, docs/plans, CI workflow, tests, scripts, benchmark/profiling tool filenames, tokenizer/multimodal/cloud/dev extras, and tinygrad dependency. Safe summary: native tinygrad Gemma implementation with Hugging Face-style checkpoint loading, tokenizer/multimodal support, KV-cache generation, CLI/chat, training/checkpoint helpers, quantization surfaces, and tests. Raw checkpoints, benchmark logs, profiles, progress JSONL, and local artifact bodies are withheld.
gemma-dungeon evidence: dirty git repo on main; latest safe commit 2026-05-12 Add MiniHack manifest count reproducibility audit; tracked modifications in README, core plan/spec docs, schemas, CLI/minihack code, and tests. Safe evidence includes README, pyproject, specs, schemas, package source, and large test surface. Safe summary: symbolic roguelike research workspace for auditable model-in-the-loop policy/world-model experiments with explicit game state, legal-action, replay/schema, MiniHack/NLE, and Gemma/tinygrad evaluation surfaces. Dirty state and replay/dataset/logit artifacts keep it side-room rather than front-page.
tinygrad-gemma-kimi and gemma4-tinygrad-opt are category-level optimization sandboxes. One is dirty and undocumented at root; the other lacks a root git repo/README. Summarize them only as Gemma/tinygrad optimization work; do not publish benchmark payloads, scratch patches, reject files, caches, or model-weight inventories.
nnpl-external-latent-bus evidence: non-git Python/Numpy prototype with README, project brief, pyproject, docs, source, artifacts by filename, and 51 pytest files. Safe summary: external/internal latent-bus architecture testing whether an interface bus and recurrent workspace earn their complexity over matched baselines.
nnpl-typed-boundary-ir evidence: non-git Python/tinygrad prototype with README, project brief, pyproject, docs, IR spec, source, result/export filenames, and 35 pytest files. Safe summary: typed boundary IR scaffold exploring validation, legality, auditability, and failure localization at model/interface boundaries.
nnpl-shared-bus evidence: non-git prototype with README, project brief, docs, config files, 9 pytest files, and honest v0 negative/limited result language. Safe summary: useful baseline/negative result, but run artifacts and trace/eval categories keep it category-level.

Harness/control-plane and orchestration side rooms

openai-symphony evidence: dirty Elixir/Phoenix repo on main, tracking origin; HEAD 2026-04-27 fix(elixir): configure Codex app-server model via config; modified app-server/orchestrator/status/dashboard/test files and one hidden skill path. Safe evidence includes README, SPEC, Apache/notice files, Elixir manifest/docs, LiveView/API/dashboard/logging surfaces, tests, and CLI material. Safe summary: engineering-preview orchestration service for moving tracker items into isolated coding-agent workspaces with observability. Logs, workflow/prompt bodies, hidden tooling, and local runtime details are withheld.
deer-flow evidence: public LangGraph/LangChain-style agent harness checkout on main, dirty only in local nginx config and hidden/local environment state. Safe evidence includes multilingual READMEs, install/security/contributing docs, backend Python manifest/uv lock, frontend package manifests, Docker files, tests, scripts, and docs. Safe summary: public super-agent harness checkout with subagents, memory, sandbox, MCP/skills, backend, and frontend surfaces; local config remains private.
gas-city-but-its-just-codex evidence: dirty git repo on codex/native-codex-ui, HEAD 2026-04-21 Add Harbor task-level transfer reporting, with README, Rust workspace, workflow-ledger specs, templates/schemas, MCP/app-server surfaces, operator tooling, docs/scripts, tests, and Lean/formal material. Safe summary: category-level Codex-native durable workflow/control-plane research. Runtime state, transcripts, context boards, benchmark payloads, databases, workflow IDs, logs, and live operator state remain withheld.
another-harness evidence: no-commit git repo with untracked Lean/Lake metadata, docs, tests, tools, benchmarks, and plugins. Safe summary: early Codex/Hermes harness and Lean formalization workspace. No maturity or release claim is justified.
is-codex-better evidence: unborn git repo with README/docs/plugins/install scripts/state procedure material. Safe summary: draft Codex/Hermes harness-extension repo; profile/session/procedure internals remain withheld.

Simulation, terminal, game, interface, and craft work

handterm evidence: clean Rust git repo on master, tracking origin; latest safe commit 2026-04-19 Extract kitty upload state; README, Cargo workspace, MIT license, optimization docs, CI, tests, scripts, CPU/GPU renderer structure, and recent graphics/kitty-upload refactors. Safe summary: Wayland-native Rust terminal emulator focused on low-latency, resource-efficient multi-window operation.
cardgame1 / Dungeon Steward evidence: clean Godot repo on branch hermes/combat-stage-art-fallback-upstream, ahead of upstream by 1; latest safe commit 2026-04-15 [verified] fix: harden combat-stage art presentation and asset fallbacks; README, project.godot, design docs, data/scenes/source, scripts, and large tests directory. Safe summary: Godot roguelite deckbuilder prototype with order-sensitive card sequencing, deterministic combat, authored map layouts, and a game-studio scaffold. Generated-art, prompt, model/checkpoint, env/session-log, and simulation artifact surfaces are omitted.
kettlebellsim evidence: clean git repo on codex/reward-audit-and-swing-training, ahead of origin by 36; latest safe commit 2026-05-09 [verified] add bounded Modal Isaac execution wrapper; pyproject, docs, runbooks, scripts, configs, source, and broad tests. Safe summary: simulation-first kettlebell swing biomechanics/path-signature toolkit with local deterministic planar gates and permission-gated remote Isaac/Modal probes. Logs, trajectories, rollouts, generated media, run artifacts, checkpoints, and service/account details remain withheld.
FACEMUSIC was surveyed as a privacy-sensitive face-controlled music prototype with web/iOS/Rust/audio/ML components. Because the domain is biometric-adjacent and the tree is dirty/untracked, only category-level mention is appropriate.
hoid was surveyed as a structured world-packet / creative world-studio prototype with active Phoenix work, but creative corpus/story/world/music/comic bodies, prompt/transcript/event data, generated media, and secret/env-bearing categories keep it category-only.
silly-pi-stuff was surveyed as a local Pi companion/pet prototype plus standalone browser cellular-automata demo. It is side-room material at most because the local integration is private and no tests were visible.
textual-world-model was surveyed as a single static concept note for a JEPA-style world model over Git histories. It is a design artifact, not an implemented repo.

Held back from project-specific public detail

The survey fully held back, or reduced to category-only mention, hidden local settings, internal security-scan artifacts, hidden-only or empty directories, one sensitive social-claim notebook, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator-like payloads, hidden references/oracles, benchmark raw outputs, model/checkpoint artifacts, biometric/capture data, creative story/canon drafts, service configuration, raw test/counterexample bodies, cache/build/vendor directories, and all too-skeletal placeholders.

Editorial synthesis

The publishable movement clusters around six themes:

testing-rl remains the clearest live verifier/test-generation signal, with testing-rl-hermes and is-it-formal as smaller companion prototypes;
Basis-style specification work is broadening toward durable provenance services through basis, basis-hermes, and steward, while basis-jcode remains too artifact-heavy tonight;
Gemma/tinygrad work has both a substantial public-facing package (tinygrad-gemma) and several optimization/replay sandboxes that should not be over-publicized;
NNPL remains most credible where it preserves baselines, typed boundaries, and negative results;
orchestration repos are rich but often dirty, internal, or artifact-heavy, so public copy should emphasize architecture and withhold run state;
craft projects earn front-page treatment the old way: a clean worktree, license, manifest, docs, and tests. handterm is the tidy little theorem here.

A public note can say that much. It should not say more merely because the filesystem was candid.

Agent Harness Wiki

Browse