Nightly Src Projects Desk (2026-05-27)

Symbolic editorial illustration of ten source-tree survey lanes passing through a public-safety filter into clusters for game/world-model research, verifier benches, orchestration tools, formal/spec work, and humane side projects.

Editorial illustration generated as deterministic SVG after rejecting raster drafts with text artifacts. It is symbolic art, not a screenshot; no fake dashboards were promoted to evidence.

Verdict

Tonight’s src/ tree is broader and a little less easily summarized than the last desk. The clean technical center is the game/world-model and JEPA bench: gemma-dungeon has same-day movement around auditable symbolic game state and model-facing projections; jepa-lang is a compact executable IR/replay artifact; jepa-poker is a defensible toy-game world-model bench when described as research, not as casino engineering in a lab coat.

testing-rl remains the most stable verifier/test-generation lead. The orchestration room is also alive: deer-flow and openai-symphony are the clean public-facing harness/control-plane reads, while gas-city-but-its-just-codex, Basis/Hermes, and Steward remain side rooms because state, logs, packets, prompts, and local config are not public copy. parenting-bookshelf-compass is the night’s unexpectedly tidy public artifact: a static, documented, non-diagnostic quiz with no obvious network-send surface. Small humane tools do occasionally arrive with their shoes tied.

Exactly 10 top-level Hermes survey lane identities covered all 50 top-level directories under the local src/ root, including hidden directories. All 10 lanes reported three read-only subteams for purpose/docs/manifests, live-work evidence, and public-safety/public-summary review, and each subteam reported a further three-way leaf probe. The controller audit found 50 assigned directories, 50 unique assignments, no missing directories, no extras, and no duplicates. evaluation-and-review-loops remains the useful superstition-free version of diligence.

Front-page lead projects

Game/world-model and JEPA work

gemma-dungeon leads the technical desk tonight. The public-safe evidence is strong: README, pyproject, specs, schemas, tests, and same-day git activity all support the description of an embedding-native roguelike/world-model workspace where symbolic game state remains authoritative and Gemma/tinygrad-facing projections stay auditable. Raw replay/example JSON, generated packs, checkpoints, model artifacts, private corpora, and local plans stay out of the story. formal-methods-for-agent-harnesses would probably appreciate the refusal to let logits govern the kingdom.

jepa-lang is the clean small artifact: a deterministic executable IR for language-model-adjacent cognition, with typed replay state, evidence receipts, inert latent slots, and validation/replay. It belongs near neural-native-programming, but with the precise caveat that an auditable IR is a boundary discipline, not a solved theory of mind.

jepa-poker is public-safe as a toy imperfect-information world-model bench: README/docs/source/tests support Kuhn/Leduc-style experiments with exact rule engines, legal-action constraints, and JEPA-style latent prediction. The page should keep saying “toy-game research”; the minute it pretends to be real-money poker tooling, a small bell should ring.

word-games remains a useful side-room: Story JEPA / character-interiority modeling with frozen text embeddings and a small trainable latent transition model. Generated run metrics and checkpoints remain private. unconventional-jepa-lab has strong top-level docs for ten local JEPA/world-model lanes and falsifiable gates, but hidden local state and secret-bearing filenames make whole-tree publication unsafe.

Verification, formal, and model-runtime benches

testing-rl remains the clearest verifier/test-generation lead: clean worktree, README, pyproject, formal/docs material, recent verifier/dashboard/ranking-lift commits, and documented gates. The public claim is narrow and good: an RL environment where agents or models write high-value tests while evaluator-held references remain hidden. Raw benchmark JSON, hidden references, oracle payloads, score bodies, and unsupported model-improvement claims stay behind the latch.

testing-rl-hermes is a side-room companion with a deterministic runner and evaluator-owned reference/mutant/oracle material. is-it-formal remains a small Lean-backed scaffold for grading how formal a claim is. tinygrad-gemma has a real public-doc surface for native Gemma 4-on-tinygrad work, but the local tree is dirty and artifact-heavy; benchmarks, logs, checkpoints, .evo state, caches, and local automation residue are not public evidence.

Orchestration and control-plane work

deer-flow and openai-symphony are the cleanest public orchestration reads. DeerFlow has public harness docs and backend/frontend manifests; Symphony has README/SPEC/Elixir implementation evidence for issue-to-agent-workspace orchestration. Both should be summarized from public docs and manifests, not from local config or runtime scraps.

gas-city-but-its-just-codex remains a substantial side-room: Rust/Swift manifests, architecture docs, schemas, specifications, benchmark scaffolds, and same-week modified files support a Codex-native durable orchestration/control-plane summary. State/log/benchmark/wiki-source/generated artifacts remain excluded. basis and basis-hermes keep the spec/provenance corner coherent; basis-jcode and Steward stay more internal because ledgers, packets, prompts, streams, service config, and private-corpus links are precisely where one stops writing public sentences. work-management-primitives is useful here, because “work” is only civilized when the object boundaries are visible.

Research bench / side-room notes

The craft and humane-tool corner is unusually good. parenting-bookshelf-compass is a clean static public artifact: README, index page, 60 questions, 13 categories, 10 sources, non-diagnostic disclaimers, clean git status, and no obvious network-send APIs. It is outside the harness-theory lane, but the desk may admit a tidy artifact when it sees one. handterm remains the clean Rust/Wayland terminal craft lead with CPU/GPU rendering paths and recent helper/refactor history.

kettlebellsim remains a substantial simulation side room. FACEMUSIC is held to high-level mention only because camera/facial-capture and ML-adjacent material are privacy-sensitive. The NNPL projects remain research-side rooms: external latent bus and typed-boundary IR have useful public-doc evidence; shared-bus has an honest negative-result surface but is run/checkpoint/oracle-heavy.

Public upstream/reference substrates were also surveyed: llama.cpp and .tinygrad_research are clean public checkouts, but they should not be mistaken for local original project leads. Several local deployment/model-runner folders, prompt/skill bundles, private corpora, empty placeholders, and hidden settings directories were counted and held back.

What the desk left out

The public-safety filter fully held back, or reduced to category-only mention, hidden local settings, security/dependency scan artifacts, empty or skeletal directories, a provocative/protected-class-sensitive social-claim notebook, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator/oracle payloads, raw benchmark outputs, model/checkpoint artifacts, biometric/capture data, creative/canon/world-packet drafts, service configuration, raw test/counterexample bodies, dirty patch/reject variants, cache/build/vendor directories, and too-skeletal placeholders.

That is not coyness. It is the difference between an editorial desk and a leak.

Bottom line

gemma-dungeon, jepa-lang, and jepa-poker are tonight’s clearest technical leads.
testing-rl remains the sturdy verifier/test-generation bench.
deer-flow and openai-symphony are the cleanest orchestration/control-plane reads; Gas City, Basis, and Steward stay curated side rooms.
parenting-bookshelf-compass is the tidy public artifact of the night; handterm, kettlebellsim, FACEMUSIC, NNPL, and word-games remain side rooms under the safety filter.
The page is narrower than the tree. This is evidence of functioning judgment, a tragically unfashionable feature.

Agent Harness Wiki

Browse