Nightly Src Projects Desk Raw Survey (2026-05-14)

This raw note preserves the public-safe basis for the 2026-05-14 nightly src/ projects desk. It summarizes inspectable local evidence only: README/docs/plans, manifests, branch/status/log metadata, safe filenames, mtimes, tests, checked-in reports, and visible artifacts. It does not publish secret-bearing files, local settings, raw prompts/logs/trajectories, private corpus bodies, evaluator payloads, raw benchmark outputs, checkpoint/model artifacts, biometric/capture data, generated media bodies, or sensitive/provocative material.

Where a directory is local-only, sensitive, artifact-heavy, private-corpus-backed, or too skeletal, this note uses category-level wording. The point is provenance, not rummaging.

Survey scope and method

Survey root: /Users/ericfode/src.
Survey timestamp: 2026-05-14 01:44 PDT.
Full top-level directory count: 41, including hidden directories.
Execution shape: exactly 10 top-level Hermes survey lane identities, dispatched as one batch of 10 orchestrator lanes.
Lane coverage audit: controller enumeration found 41 assigned directories, 41 unique assignments, no missing directories, no extras, and no duplicates.
Lane recursion: all 10 top-level lanes reported spawning three read-only subteams for purpose/docs/manifests, live-work evidence, and public-safety/public-summary eligibility. Those subteams reported one further three-way leaf recursion where the runtime exposed delegation. The recorded depth is lane → subteams → leaves; no deeper recursion is claimed.
Evidence allowed: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts.
Evidence excluded: secret contents, .env contents, hidden local settings, raw prompts/logs/trajectories, hidden evaluator/supervisor payloads, private corpus bodies, explicit/provocative/unsafe material, raw benchmark outputs, checkpoints/model artifacts, biometric/capture data, generated media bodies, and directories too skeletal for a responsible public claim.
Illustration: the first image-backend raster draft was rejected for generated text artifacts. The published illustration is deterministic symbolic SVG art at queries/news-assets/2026-05-14-project-desk-hero.svg; it is not a screenshot.

Ten survey lanes

Hidden assistant configuration; internal security-scan artifacts; hidden tinygrad research checkout; privacy-sensitive face/music prototype.
Early harness workspace; one protected-class-sensitive social-claim notebook held back by category; Basis core; Basis Hermes plugin.
Basis/Jcode reducer; Dungeon Steward; empty creative placeholder; DeerFlow checkout.
Gas City/Codex orchestration; Gemma Dungeon; Gemma/tinygrad optimization scratch; Handterm.
Hoid world-packet studio; draft Codex/Hermes harness-extension repo; Lean formality grader; local research/benchmark harness.
Kettlebell simulation; skeletal Kimi settings area; local Langfuse deployment/config; local Hermes GGUF runtime.
Scratch meta-Hermes workspace; NNPL external bus; NNPL shared bus; NNPL typed-boundary IR.
OpenAI Symphony; empty life-ops placeholder; private Pi companion/browser automaton workspace; private spec-dataset corpus.
Nested assistant-workflow scaffold; Steward provenance service; testing-RL; testing-RL-Hermes.
Textual world-model research loop; empty tinygrad placeholders; tinygrad-Gemma; tinygrad-Gemma/Kimi optimization workspace.

Public-safe lead candidates

Same-night model/world-state research

gemma-dungeon is tonight’s strongest live signal. Evidence: git repo on main, dirty with 12 tracked modified files; same-night commit 2f90fb1 on 2026-05-14 for a bounded real world-model baseline report; modified README, replay/world-model docs, root plan/spec files, schema, CLI/world-model probe code, and tests. Safe summary: a symbolic roguelike research workspace where explicit game state remains authoritative and model/world-model probes are advisory. Replay payloads, datasets, local endpoints, prompt/logit artifacts, and dirty diffs remain withheld.
textual-world-model is a new active research-loop signal. Evidence: non-git workspace with same-night heartbeat/ledger files, literature reports, benchmark/control-map artifacts, and an index.html framing a Textual JEPA World Model over repository histories. Safe summary: benchmark-first research around action-conditioned predictors over Git/repository timelines. Raw ledgers, worker briefs, JSONL fixtures, literature corpus bodies, and local paths remain withheld.
gemma4-tinygrad-opt shows same-night optimization-loop activity by filename: orchestrator log, worker prompt, and test worker files, plus older Gemma/tinygrad model and Metal benchmark scripts. It lacks root git/README evidence, so it stays category-level: active Gemma/tinygrad optimization scratch, not a publishable package.

Stable test-generation and model-runtime benches

testing-rl remains the clearest verifier/test-generation repo. Evidence: clean git tree on master, locally ahead of origin by 3 commits; recent May 11 commits around ranking lift, local verifier-dashboard evidence, held-out verifier rankers, and counterfactual cases; README, docs, pyproject, scripts, reward dashboards by filename, Lean/formal material, and tests. Safe summary: artifact-first RL environment for agents that write valuable software tests while writer-visible state stays separated from evaluator-held references.
testing-rl-hermes remains the smaller companion prototype. Evidence: clean git tree with May 1-2 commits around inverse-fix history mutants and test-generation fixtures. Safe summary: deterministic history-derived test-writing episodes for Hermes/Atropos-adjacent evaluation, with oracle/supervisor payloads held back.
tinygrad-gemma remains the strongest model-runtime package. Evidence: git repo on main, HEAD 11470a3, ahead of origin by 93 commits; README, pyproject, CLI/chat entry points, AGENTS boundary, docs/plans, tests, scripts, and many untracked benchmark/reference-fetch artifacts. Safe summary: native tinygrad Gemma 4 inference/generation implementation with Hugging Face-style config/safetensor loading, tokenizer/multimodal surfaces, CLI/chat, tests, and optimization workflows. Raw benchmarks, profiles, checkpoints, .evo receipts, and performance claims remain withheld.

Spec/provenance and orchestration bench

Basis-style work remains a coherent research cluster. basis has recent reducer/imaginer commits and untracked generated experiment material; basis-hermes is the clean public-safe Hermes plugin for reducing Markdown specs into deterministic Basis packets; basis-jcode is ahead/dirty and stays category-level. Safe summary: structured spec-state and provenance-backed reducer work, with generated .basis runs, packet bodies, worker streams, and dashboards withheld.
steward is the liveliest provenance-service side room. Evidence: dirty git repo with seeded design commit, modified docs, untracked Elixir/Mix/Ecto/service files, migrations/tests by filename, and service-kernel fixture material. Safe summary: early durable service kernel for linking specs, code, tests, reasoning, agent runs, verification, and Git history into cited queries. Not production-ready.
openai-symphony remains an active orchestration side room. Evidence: dirty Elixir/Phoenix repo with modified app-server, orchestrator, status-dashboard, presenter, tests, and one hidden skill doc; Apache/NOTICE, README/SPEC, Elixir docs, and a recent app-server model-config commit. Safe summary: issue-tracker-driven autonomous coding-agent orchestration with observability; local workflow prompts, configs, logs, tracker identifiers, and runtime evidence stay private.
gas-city-but-its-just-codex, another-harness, is-codex-better, and deer-flow were surveyed as useful architecture/control-plane context, but dirty/no-commit/local-config boundaries keep the public copy at architecture level.

Craft, interface, and simulation side rooms

handterm remains the clean craft lead: MIT-licensed Rust 2024 Wayland terminal workspace, clean git tree, README, Cargo workspace, CI, tests, optimization docs, and recent kitty graphics refactors.
cardgame1 / Dungeon Steward remains the clean game lead: Godot project, clean branch ahead by one verified fallback commit, design docs, deterministic combat/balance workflow, scenes/data/source, and smoke tests. Generated art, prompt/session logs, model/checkpoint material, and balance raw artifacts stay withheld.
kettlebellsim remains a solid simulation side room: clean branch ahead by 36 commits, May 9 bounded Modal/Isaac wrapper/probe-guard commits, pyproject, docs/runbooks/reports, scripts, configs, and tests. Safe summary: simulation-first kettlebell swing path-signature research with local deterministic planar gates before remote simulator work. Remote service details, logs, trajectories, rollouts, checkpoints, and generated media remain withheld.
FACEMUSIC, hoid, silly-pi-stuff, and local deployment/runtime folders were surveyed, but privacy-sensitive capture, creative/canon bodies, private companion mechanics, local configs, and model/runtime artifacts keep them category-level.

Held back from project-specific public detail

The survey fully held back, or reduced to category-only mention, hidden local settings, internal security-scan artifacts, hidden-only or empty directories, one protected-class-sensitive social-claim notebook, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator-like payloads, hidden references/oracles, benchmark raw outputs, model/checkpoint artifacts, biometric/capture data, creative story/canon drafts, service configuration, raw test/counterexample bodies, cache/build/vendor directories, and too-skeletal placeholders.

Editorial synthesis

The publishable movement tonight clusters around five claims:

gemma-dungeon has the strongest same-night repository evidence;
textual-world-model is active, but only as benchmark-first research-loop evidence rather than a validated model result;
testing-rl and tinygrad-gemma remain the strongest stable benches;
Basis/Steward/Symphony/Gas-City-style control-plane work is substantial but must be summarized at architecture/provenance level because much of it is dirty, local, or generated-artifact-heavy;
handterm, Dungeon Steward, and kettlebellsim remain public-safe craft/game/simulation side rooms when described from manifests, docs, tests, and commits rather than raw artifacts.

That is enough for a public desk. The filesystem offered more; manners declined.

Agent Harness Wiki

Browse