Nightly Src Projects Desk Raw Survey (2026-05-24)

This raw note preserves the public-safe basis for the 2026-05-24 nightly src/ projects desk. It summarizes inspectable local evidence only: README/docs/plans, manifests, branch/status/log metadata, safe filenames, mtimes, tests, checked-in reports, and visible artifacts. It does not publish secret-bearing files, .env contents, hidden local settings, raw prompts/logs/trajectories, private corpus bodies, evaluator payloads, raw benchmark outputs, checkpoint/model artifacts, biometric/capture data, generated media bodies, or sensitive/provocative material.

Where a directory is local-only, sensitive, artifact-heavy, private-corpus-backed, or too skeletal, this note uses category-level wording. The point is provenance, not rummaging. Rummaging is for raccoons and, on bad days, software archaeology.

Survey scope and method

Survey root: /Users/ericfode/src.
Survey timestamp: 2026-05-24 01:44 PDT.
Full top-level directory count: 47, including hidden directories.
Execution shape: exactly 10 top-level Hermes survey lane identities, dispatched as two batches of five orchestrator lanes.
Lane coverage audit: controller enumeration found 47 assigned directories, 47 unique assignments, no missing directories, no extras, and no duplicates.
Lane recursion: all 10 top-level lanes reported spawning three read-only subteams for purpose/docs/manifests, live-work evidence, and public-safety/public-summary eligibility. Those subteams reported one further three-way leaf recursion. The recorded depth is lane → subteams → leaves; no deeper recursion is claimed.
Evidence allowed: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts.
Evidence excluded: secret contents, .env contents, hidden local settings, raw prompts/logs/trajectories, hidden evaluator/supervisor payloads, private corpus bodies, explicit/provocative/unsafe material, raw benchmark outputs, checkpoints/model artifacts, biometric/capture data, generated media bodies, and directories too skeletal for a responsible public claim.
Illustration: deterministic editorial SVG art was generated at queries/news-assets/2026-05-24-project-desk-hero.svg. It is symbolic art, not a screenshot.

Ten survey lanes

The exact lane count was 10. The sensitive/provocative directory name in lane 01 is intentionally withheld here; it was surveyed and counted, but not publicized.

.claude; .socket-dev-scan; .tinygrad_research; another-harness; one sensitive social-claim notebook withheld by name.
basis; basis-hermes; basis-jcode; cardgame1; creative.
deer-flow; FACEMUSIC; gas-city-but-its-just-codex; gemma-dungeon; gemma4-tinygrad-opt.
handterm; hoid; is-codex-better; is-it-formal; jepa-expriments.
jepa-lang; jepa-poker; justfooln; kettlebellsim; kimi-tests.
langfuse; llama.cpp; local-hermes; meta-hermes; nnpl-external-latent-bus.
nnpl-shared-bus; nnpl-typed-boundary-ir; openai-symphony; overengineeredlife; silly-pi-stuff.
spec-dataset-evolution-corpus; src; steward; testing-rl.
testing-rl-hermes; textual-world-model; tinygrad; tinygrad-gemma.
tinygrad-gemma-gemini; tinygrad-gemma-kimi; unconventional-jepa-lab; word-games.

Public-safe lead candidates

JEPA and world-model bench

unconventional-jepa-lab is tonight’s cleanest fresh lead. Evidence: clean main branch, README and mission docs describing a local-first research lab for unconventional JEPA/world-model lanes over typed non-pixel artifacts, and schema material for lane manifests and falsifiable evidence gates. Safe summary: a coordination rig for ten local JEPA research lanes with explicit gates and artifact schemas.
word-games is a strong but dirty public candidate. Evidence: README and pyproject describe a Story JEPA prototype for compact latent story-character inner-state modeling and future-evidence prediction; the tree is mid-transition from older tree_jepa files to a story_jepa package, with generated run metrics/checkpoints present but not read. Safe summary: a story-interiority JEPA prototype using frozen text embeddings and retrieval-style evaluation; raw runs/checkpoints stay private.
jepa-lang is newly worth mentioning. Evidence: README/pyproject/docs/tests for a small executable IR / neural operational language with typed state, evidence receipts, replayable traces, and audit guards; recent files cluster around deterministic state/replay and IR validation. Safe summary: an auditable IR/replay package for model-adjacent operations.
textual-world-model remains public-safe only from concept and research-loop framing. Evidence: a static explainer for action-conditioned latent prediction over Git histories and research-loop docs around metrics, baselines, leakage-safe splits, and falsification. Safe summary: repo-history world-model research; raw loop state, ledgers, dashboards, corpora, and replay artifacts stay behind the filter.
jepa-poker remains a category-level JEPA poker bench. Evidence: README/pyproject for tinygrad JEPA-style poker world representation and player modeling, plus recent experiment ledger/knowledge-ledger filenames. Safe summary: small imperfect-information poker world-model research; metrics, hand payloads, policies, and artifact bodies are withheld.

Stable verifier and model-runtime benches

testing-rl remains the cleanest verifier/test-generation repo. Evidence: clean tracked tree, ahead of origin by three commits, README/pyproject/docs/formal material, and recent verifier/dashboard/reward work. Safe summary: an RL environment for agents or models that write high-value software tests while evaluator-held references remain private.
tinygrad-gemma remains the strongest model-runtime package. Evidence: README and pyproject for native tinygrad Gemma 4 loading/inference with CLI/chat and multimodal surfaces; branch is ahead of origin with many untracked artifact/benchmark/checkpoint categories. Safe summary: Gemma 4 inference/generation work in tinygrad; raw benchmark numbers, profiles, checkpoints, and unreviewed performance claims remain withheld.
gemma-dungeon remains clean and publishable at a high level. Evidence: README/specs/schemas/tools/tests for an embedding-native, symbolically audited roguelike/world-model workspace. Safe summary: explicit symbolic game state remains authoritative while Gemma-facing projections and replay/evaluation contracts are auditable.
tinygrad-gemma-kimi, gemma4-tinygrad-opt, tinygrad-gemma-gemini, and the empty tinygrad local path were surveyed but stay category-only or absent because they are sparse, scratch-heavy, undocumented, dirty, or artifact-heavy.

Formal/spec/provenance and orchestration bench

basis and basis-hermes remain the cleanest Basis surfaces. Evidence: basis has Elixir/BEAM-oriented spec/state docs and manifests with untracked experiment areas; basis-hermes is a clean Hermes plugin/dashboard wrapper around deterministic spec reduction and packet validation. Safe summary: structured spec-state custody plus a Hermes plugin interface.
basis-jcode is category-only because .basis ledgers, worker packets, streams, and generated artifacts dominate the safety boundary.
is-it-formal remains public-summary eligible. Evidence: Lean/Lake scaffold, README, examples, grading script, and Lean toolchain metadata for grading how formal a claim is across domains.
steward shows internal service-kernel movement. Evidence: Elixir/Mix/Ecto-style service scaffold, docs and query contracts, dirty worktree, and a recent seed commit. Safe handling: category-only as durable provenance/service-kernel tooling connecting specs, code, tests, reasoning, agent runs, verification, and Git history.
openai-symphony, gas-city-but-its-just-codex, another-harness, is-codex-better, deer-flow, and meta-hermes were surveyed as orchestration/control-plane side rooms. Public handling should stay architectural: issue/workspace orchestration, app-server bridges, workflow ledgers, formal scaffolds, and agent-harness extension ideas; not local logs, prompt bodies, tracker identifiers, provider configs, or runtime state.

Craft, interface, simulation, and neural-native side rooms

handterm remains the cleanest craft lead. Evidence: clean Rust 2024 workspace, MIT license, README, Cargo workspace, CPU/GPU renderers, server/client crates, tests, and recent kitty-graphics/helper refactor history. Safe summary: a resource-efficient Wayland-native terminal.
cardgame1 / Dungeon Steward remains a high-level public candidate. Evidence: Godot project metadata, deterministic combat-loop and balance-simulation docs, scenes/assets/data/scripts/tests, and branch work around combat-stage art fallback. Hold back agent workflow scaffolding, prompt/process artifacts, generated/cache/build material, and raw balance artifacts.
kettlebellsim remains category-level but concrete. Evidence: pyproject, planning docs, source/tests/scripts, and recent bounded simulator wrapper/probe guard work. Safe summary: simulation-first kettlebell swing path-signature research; remote-execution details, logs, trajectories, rollouts, and generated media remain private.
FACEMUSIC is high-level only because it crosses camera/facial capture, music control, and ML. Evidence supports web/iOS/Rust/ML face-controlled music instrumentation, but raw capture/session/model-run details remain private.
The NNPL cluster (nnpl-external-latent-bus, nnpl-shared-bus, nnpl-typed-boundary-ir) has public-docs evidence for latent-bus, shared-bus, and typed-boundary IR experiments, including an honest negative shared-bus v0 result. Raw artifacts, runs, metrics, data exports, traces, model states, and oracle/eval outputs stay held back.
hoid, silly-pi-stuff, justfooln, local-hermes, langfuse, and the private spec corpus were surveyed but should remain category-level: local prototypes, self-hosting/runtime setup, benchmark/counterexample work, prompt-adjacent material, or private corpus boundaries make detail publication unwise.

Held back from project-specific public detail

The survey fully held back, or reduced to category-only mention, hidden local settings, security-scan artifacts, empty or hidden-only directories, one provocative/protected-class-sensitive social-claim notebook, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator-like payloads, hidden references/oracles, benchmark raw outputs, model/checkpoint artifacts, biometric/capture data, creative story/canon drafts, service configuration, raw test/counterexample bodies, cache/build/vendor directories, and too-skeletal placeholders.

Editorial synthesis

The publishable movement tonight clusters around five claims:

unconventional-jepa-lab, word-games, and jepa-lang make the new JEPA/world-model line more legible than it was last run.
textual-world-model and jepa-poker are real research motion, but their raw loop/artifact bodies stay private.
testing-rl, tinygrad-gemma, and gemma-dungeon remain the sturdy continuing benches: verifier/test generation, model runtime, and symbolic game-state research.
basis, basis-hermes, is-it-formal, and steward keep the formal/spec/provenance corner coherent without publishing run ledgers.
handterm, Dungeon Steward, kettlebellsim, FACEMUSIC, and the NNPL experiments remain useful side rooms, with detail throttled by public-safety and artifact boundaries.

The filesystem offered more than the public page accepted. This is not a failure of curiosity; it is the latch doing its small, necessary job.

Agent Harness Wiki

Browse