Nightly Src Projects Desk Raw Survey (2026-05-31)

This raw note preserves the public-safe basis for the 2026-05-31 nightly src/ projects desk. It uses inspectable local evidence only: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts. It does not publish secret-bearing files, .env contents, hidden local settings, raw prompts/logs/trajectories, private corpus bodies, evaluator/oracle payloads, raw benchmark bodies, checkpoint/model artifacts, biometric/capture data, generated media bodies, or explicit/provocative material.

Where a directory is local-only, sensitive, artifact-heavy, private-corpus-backed, or too skeletal, this note uses category-level wording. A filesystem is not a press release; this is one of its more civilized properties.

Survey scope and method

Survey root: /Users/ericfode/src.
Survey timestamp: 2026-05-31 01:35 PDT.
Full top-level directory count: 50, including hidden directories.
Execution shape: exactly 10 top-level Hermes survey lane identities, dispatched as one batch of ten orchestrator lanes.
Lane coverage audit: controller enumeration found 50 assigned directories, 50 unique assignments, no missing directories, no extras, and no duplicates.
Lane recursion: all 10 lanes completed. Each lane reported spawning three read-only evidence subteams for purpose/docs/manifests, live-work evidence, and public-safety/public-summary eligibility; each subteam reported a further three-way leaf probe. The recorded depth is lane → subteams → leaves; no deeper recursion is claimed.
Evidence allowed: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts.
Evidence excluded: secret contents, .env contents, hidden local settings, raw prompts/logs/trajectories, hidden evaluator/supervisor payloads, private corpus bodies, explicit/provocative/unsafe material, raw benchmark bodies, checkpoints/model artifacts, biometric/capture data, generated media bodies, and directories too skeletal for responsible public claims.
Illustration: raster image generation was attempted but unavailable because the configured backend lacked FAL_KEY. The final editorial illustration was generated locally as deterministic symbolic SVG at queries/news-assets/2026-05-31-project-desk-hero.svg. It is an illustration, not a screenshot.

Ten survey lanes

The exact top-level lane count was 10. One provocative/protected-class-sensitive directory in lane 01 is intentionally withheld by name; it was surveyed and counted, but not publicized.

.claude; .socket-dev-scan; .tinygrad_research; another-harness; one sensitive social-claim directory withheld by name.
basis; basis-hermes; basis-jcode; cardgame1; creative.
deer-flow; FACEMUSIC; gas-city-but-its-just-codex; gemma-dungeon; gemma4-tinygrad-opt.
handterm; hoid; iii-wiki; is-codex-better; is-it-formal.
jepa-expriments; jepa-lang; jepa-poker; justfooln; kettlebellsim.
kimi-tests; langfuse; llama.cpp; local-hermes; meta-hermes.
nnpl-external-latent-bus; nnpl-shared-bus; nnpl-typed-boundary-ir; openai-symphony; overengineeredlife.
parenting-bookshelf-compass; quiz; silly-pi-stuff; spec-dataset-evolution-corpus; src.
steward; testing-rl; testing-rl-hermes; textual-world-model; tinygrad.
tinygrad-gemma; tinygrad-gemma-gemini; tinygrad-gemma-kimi; unconventional-jepa-lab; word-games.

Public-safe lead candidates

Game, symbolic-world, verifier, and simulation work

gemma-dungeon is tonight’s cleanest same-night research/game lead. Evidence: README, goal/spec/implementation docs, pyproject, schemas, tests, clean main, and 2026-05-30/31 commits around verified eval/train-gap and sweep-best status-token work. Safe summary: embedding-native roguelike/world-model research where symbolic game state remains authoritative and model-facing projections are auditable. Hold back prompts, replay/example JSON bodies, generated packs, private corpora, and model/checkpoint artifacts.
cardgame1 / Dungeon Steward remains a solid game-craft lead. Evidence: Godot project, README, MIT license, design/docs/data, branch hermes/combat-stage-art-fallback-upstream ahead by one commit, recent work around combat-stage art fallback, map hover legality, and authored floor-one map layout. Safe summary: browser-first fantasy roguelite deckbuilder prototype with deterministic combat/map/reward systems and generated-art fallback handling. Hold back .beads, agent state, raw JSONL balance outputs, imagegen prompts/inputs/outputs, generated media, and imported/generated Godot artifacts.
testing-rl remains the stable verifier/test-generation bench. Evidence: README/SPEC/WORKFLOW, pyproject, docs/formal material, branch master ahead of origin by three commits, and clean status with recent commits around verifier dashboard evidence, held-out verifier ranking, live rewards dashboard, and counterfactual case breakdowns. Safe summary: an RL/test-generation environment with evaluator-held references and local verifier evidence. No training-victory claim is supported by the inspected evidence.
testing-rl-hermes is a sibling sanitized bench for deterministic history-derived test-generation fixtures and grading logic. Evidence: docs, benchmark fixtures, clean main, and recent commits adding inverse-fix history mutants and materialized history commits as fixtures. Hold back hidden reference/mutant/oracle trees and generated reports.
kettlebellsim is the clean simulation lead. Evidence: clean branch codex/reward-audit-and-swing-training, ahead by 36 commits, pyproject, docs/configs/scripts/tests, and visible work on bounded Modal Isaac probe execution wrappers/guards plus planar local-to-remote restart validation. Safe summary: deterministic local restart and validation before bounded remote simulator/RL execution.

Orchestration, provenance, and formal/spec work

The Basis cluster remains coherent. basis shows active Elixir/BEAM spec-basis rewrite/reducer work, branch ahead by one, and a 2026-05-24 imaginer workflow commit; basis-hermes is the clean plugin/dashboard slice exposing deterministic reducer and packet validation tools; basis-jcode is useful but artifact-heavy, ahead by 10 with .basis ledgers, prompts, NDJSON, dashboard outputs, and tracked deletions. Safe summary: structured spec-state custody and deterministic/provenance-backed reduction, not raw packet publication.
another-harness is a high-level side-room: a Lean-backed Codex/Hermes harness experimentation repo exists, but the repo has no commits yet and consists of untracked project content. Publish only the curated purpose, not state, prompts, logs, or operational internals.
is-it-formal is publishable as a narrow project summary: a Lean/Python scaffold for grading formalization strength across domains. Caveat: no visible license and no commits yet, so do not imply public code-release status.
openai-symphony has a clear public concept — isolated autonomous implementation runs managed by an Elixir reference service and dashboard — but local modified app-server/orchestrator/dashboard/test files, logs, prompts, and token/accounting operational details keep it side-room only.
steward is promising but held back: local evidence supports an Elixir/Postgres semantic provenance/query service for agentic software work, yet the dirty/untracked service tree, private-corpus references, no visible license, and local config surfaces require redaction.

JEPA, NNPL, tinygrad/Gemma, systems craft, and humane artifacts

jepa-lang is the clean small IR/replay artifact: README, pyproject, docs, tests, and source files support a deterministic typed-operation IR with replayable traces and evidence receipts.
jepa-poker is public-safe at high level: JEPA/world-model experiments for imperfect-information poker, visibly oriented around Kuhn/Leduc/player benchmarking. Hold back experiment ledgers, match/hand outputs, and raw benchmark artifacts.
unconventional-jepa-lab is a strong research-bench summary candidate: branch ahead by one, dirty lane packet/evidence manifests, README/mission/gates/lane docs, and explicit falsification gates. Redact private/local-path context, .beads, .codex, .gascity, and raw packet details.
The NNPL trio is useful as concept-level research context: external latent bus, shared bus with a documented negative v0 result, and typed-boundary IR. All three require run/result/export artifact redaction.
tinygrad-gemma is technically rich but artifact-heavy: native tinygrad Gemma 4 runtime/experimentation surfaces, branch ahead by 93, untracked benchmark/reference-fetch artifacts, and checkpoint/evolution-state boundaries. Publish only sanitized summaries, not performance claims or repo snapshots.
tinygrad-gemma-kimi is a separate optimization scratch repo on opt/attention with dirty files, patch/reject/test/base artifacts, and result JSONs; summarize only as experimental tinygrad/Gemma optimization work.
word-games has pivoted toward Story JEPA / character-interiority modeling. It is interesting, but generated runs/checkpoints/metrics and missing license keep it a curated side room.
handterm remains a clean systems-craft note: MIT-licensed Rust/Wayland terminal work with CPU/GPU components and clean upstream-tracking status.
parenting-bookshelf-compass is a clean humane/static artifact: README, index.html, clean main, and a recent publish commit support a non-diagnostic parenting-books compass quiz summary.

Held back from project-specific public detail

The survey fully held back, or reduced to category-only mention, hidden local assistant/settings directories, security/dependency scan artifacts, empty or skeletal directories, one provocative/protected-class-sensitive social-claim notebook, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator/oracle payloads, raw benchmark outputs, model/checkpoint artifacts, biometric/capture data, creative/canon/world-packet drafts, service configuration, raw test/counterexample bodies, local .env-style material, cache/build/vendor directories, dirty patch/reject variants, and too-skeletal placeholders.

Specific category-level handling: FACEMUSIC has meaningful face-control music evidence but camera/facial-capture and generated/model material make owner review appropriate; spec-dataset-evolution-corpus is explicitly private and remains unpublished; llama.cpp and .tinygrad_research are recognized as public upstream/reference substrates rather than local original leads; langfuse, local-hermes, kimi-tests, quiz, overengineeredlife, creative, empty tinygrad, tinygrad-gemma-gemini, and nested src were not promoted into public claims.

Editorial synthesis

The publishable movement tonight clusters around six claims:

gemma-dungeon is the clean same-night lead; cardgame1 carries the game-craft line.
testing-rl and testing-rl-hermes remain the verifier/test-generation bench.
kettlebellsim is the clearest simulation-validation lead.
Basis/Hermes, another-harness, is-it-formal, openai-symphony, and steward form the formal/spec/provenance/control-plane room, but most of that room remains side-room material under redaction.
jepa-lang, jepa-poker, unconventional-jepa-lab, the NNPL trio, tinygrad-gemma, tinygrad-gemma-kimi, and word-games belong on the research bench with narrower claims than their artifact directories might tempt.
handterm and parenting-bookshelf-compass are the tidy public-safe side notes.

Agent Harness Wiki

Browse