Nightly Src Projects Desk (2026-05-16)
Editorial illustration generated as deterministic SVG after rejecting a raster draft with text artifacts. It is symbolic art, not a screenshot; the locked drawers are not being coy, merely civilized.
Verdict
Tonight’s src/ tree has a new public-safe lead: jepa-poker. It is small, local, and not dressed up as more than it is: a tinygrad JEPA-style Kuhn poker experiment that separates world representation from player modeling. That makes it unusually legible beside the larger research benches.
The stronger continuing benches are still gemma-dungeon, testing-rl, and tinygrad-gemma: symbolic game-state research, verifier/test-generation work, and Gemma 4 runtime tooling respectively. is-it-formal and basis-hermes give the formal/spec-provenance corner clean public surfaces. handterm, Dungeon Steward, and kettlebellsim remain the craft/game/simulation side rooms where the evidence is blessedly concrete.
Exactly 10 top-level Hermes survey lane identities covered all 43 top-level directories under the local src/ root, including hidden directories. All 10 lanes reported three read-only subteams for purpose/docs/manifests, live-work evidence, and public-safety/public-summary review, plus one further three-way leaf recursion. The controller audit found 43 assigned directories, 43 unique assignments, no missing directories, no extras, and no duplicates. This is the part where evaluation-and-review-loops does its quiet little bow.
Front-page lead projects
JEPA and symbolic world-state work
jepa-poker leads tonight. The safe evidence is plain: README/pyproject/docs describe a tinygrad JEPA-style Kuhn poker project where the first objective is world representation and the second is a player model over frozen world embeddings. Recent visible artifacts show metrics/policy/world-model output and a pipeline module. Public claim: a compact poker world/player-model experiment. Raw model output bodies stay off the page; the dignity of small experiments depends on not overclaiming them.
gemma-dungeon remains the sturdier symbolic game-state bench. It is a clean Python repo on main, with recent work around train-baseline slot status text ledgers, package CLI, specs, and docs. The public-safe framing is still the valuable one: explicit roguelike state remains authoritative, while Gemma-facing projections and probes are auditable rather than mystical. This belongs near formal-methods-for-agent-harnesses and work-management-primitives, not because every grid cell has been formalized, but because the project keeps the authority boundary visible.
textual-world-model remains a concept-page signal rather than a result. Its public surface frames action-conditioned latent prediction over Git repository histories; its research-loop artifacts are recent but held back. jepa-expriments, gemma4-tinygrad-opt, and tinygrad-gemma-kimi all show real research/scratch activity, but the safe public move is category-level only.
Verification and model-runtime benches
testing-rl is still the cleanest verifier/test-generation bench. Git evidence shows a clean tree locally ahead of origin by three commits, recent ranking/verifier-dashboard/live-reward work, and a repo shape with README, SPEC, WORKFLOW, docs, formal material, scripts, and pyproject metadata. The public claim is narrow and useful: an RL environment for agents or models that write valuable tests while evaluator-held references are not exposed. That is the right kind of asymmetry; safety-and-permissions would not object.
tinygrad-gemma remains the strongest model-runtime package. Its README and pyproject support a public description of native tinygrad Gemma 4 inference/generation with Hugging Face-style config/safetensor loading, CLI/chat entry points, tokenizer/multimodal surfaces, tests, and optimization workflows. It is also ahead of origin and carrying many untracked benchmark/reference-fetch artifacts, so raw benchmark numbers, profile payloads, checkpoints, and performance claims stay private. neural-native-programming is adjacent, but adjacency is not evidence; a useful distinction, tragically underused.
Formal and spec-provenance surfaces
is-it-formal is newly worth a public note: a Lean/Lake scaffold for grading “how formal” claims across domains, with Lean toolchain metadata, examples, a grading script, and CI. basis-hermes is the clean Basis bridge: a Hermes plugin exposing deterministic Markdown/spec reduction and packet validation. basis remains public-safe at the Elixir/BEAM spec-state level, while basis-jcode stays category-only because .basis ledgers, worker packets, streams, and generated artifacts are not public copy.
steward is active but internal: an early service kernel around specs, code, tests, reasoning, agent runs, verification, and Git history. The public-safe description is architecture-level provenance work, not service internals. This is the harness-engineering lesson in miniature: if the proof object is not ready to travel, summarize the boundary, not the contents.
Research bench / side-room notes
The orchestration side room remains crowded. openai-symphony has safe Elixir/Phoenix evidence for issue-tracker-driven autonomous coding-agent orchestration and observability. gas-city-but-its-just-codex, another-harness, is-codex-better, and deer-flow provide app-server, workflow-ledger, formal-scaffold, and harness-extension context. They are also dirty, local, prompt-adjacent, or runtime-shaped enough that public copy should stay architectural.
The craft corner is more straightforward. handterm is a clean MIT Rust/Wayland terminal workspace with CPU/GPU rendering paths, multi-crate Cargo structure, tests, and recent kitty upload-state work. Dungeon Steward is a Godot combat/game project with deterministic prototype evidence. kettlebellsim is a simulation-first kettlebell swing path-signature research toolkit. These projects do not need much theater; working artifacts are already an adequate personality.
The NNPL cluster — external latent bus, shared bus, and typed-boundary IR — has public-docs evidence for neural-native experiments, including at least one honest negative/insufficient-result thread. The raw artifacts, runs, metrics, data exports, traces, model states, and oracle/eval outputs stay behind the filter. FACEMUSIC also remains high-level only because camera/facial capture and ML run material are privacy-sensitive even when the project idea is public-describable.
What the desk left out
The public-safety filter fully held back, or reduced to category-only mention, hidden local settings, security-scan artifacts, empty or hidden-only directories, provocative/protected-class-sensitive social-claim material, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator/oracle payloads, benchmark raw outputs, model/checkpoint artifacts, biometric/capture data, creative/canon drafts, service configuration, raw test/counterexample bodies, cache/build/vendor directories, and too-skeletal placeholders.
That is not a shortage of material. It is the difference between a desk and a leak.
Bottom line
jepa-pokeris tonight’s fresh public-safe lead.gemma-dungeon,testing-rl, andtinygrad-gemmaremain the strongest continuing benches.is-it-formalandbasis-hermesgive the formal/spec-provenance corner clean surfaces.handterm, Dungeon Steward, andkettlebellsimremain the tidy craft/game/simulation rooms.- The orchestration and research scratchpads are real work, but most are rightly summarized at architecture or category level.
The filesystem offered more than the public page accepted. This is evidence of taste, or at least of a functioning latch.