Nightly Src Projects Desk Raw Survey (2026-05-27)

This raw note preserves the public-safe basis for the 2026-05-27 nightly src/ projects desk. It uses inspectable local evidence only: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts. It does not publish secret-bearing files, .env contents, hidden local settings, raw prompts/logs/trajectories, private corpus bodies, evaluator/oracle payloads, raw benchmark bodies, checkpoint/model artifacts, biometric/capture data, generated media bodies, or explicit/provocative material.

Where a directory is local-only, sensitive, artifact-heavy, private-corpus-backed, or too skeletal, this note uses category-level wording. The aim is provenance, not rummaging. Rummaging is for raccoons and occasionally for build systems; neither is an editorial standard.

Survey scope and method

Survey root: /Users/ericfode/src.
Survey timestamp: 2026-05-27 01:36 PDT.
Full top-level directory count: 50, including hidden directories.
Execution shape: exactly 10 top-level Hermes survey lane identities, dispatched as one batch of ten orchestrator lanes.
Lane coverage audit: controller enumeration found 50 assigned directories, 50 unique assignments, no missing directories, no extras, and no duplicates.
Lane recursion: all 10 lanes reported spawning exactly three read-only subteams for purpose/docs/manifests, live-work evidence, and public-safety/public-summary eligibility. Each subteam reported a further three-way leaf probe. The recorded depth is lane → subteams → leaves; no deeper recursion is claimed.
Evidence allowed: README/docs/plans, manifests, branch/status/log metadata, safe modified/untracked filenames, mtimes, tests, checked-in reports, and visible artifacts.
Evidence excluded: secret contents, .env contents, hidden local settings, raw prompts/logs/trajectories, hidden evaluator/supervisor payloads, private corpus bodies, explicit/provocative/unsafe material, raw benchmark bodies, checkpoints/model artifacts, biometric/capture data, generated media bodies, and directories too skeletal for responsible public claims.
Illustration: generated editorial art was saved at queries/news-assets/2026-05-27-project-desk-hero.svg as deterministic symbolic SVG after rejecting raster drafts with text artifacts. It is an illustration, not a screenshot.

Ten survey lanes

The exact top-level lane count was 10. One provocative/protected-class-sensitive directory in lane 01 is intentionally withheld by name; it was surveyed and counted, but not publicized.

.claude; .socket-dev-scan; .tinygrad_research; another-harness; one sensitive social-claim directory withheld by name.
basis; basis-hermes; basis-jcode; cardgame1; creative.
deer-flow; FACEMUSIC; gas-city-but-its-just-codex; gemma-dungeon; gemma4-tinygrad-opt.
handterm; hoid; iii-wiki; is-codex-better; is-it-formal.
jepa-expriments; jepa-lang; jepa-poker; justfooln; kettlebellsim.
kimi-tests; langfuse; llama.cpp; local-hermes; meta-hermes.
nnpl-external-latent-bus; nnpl-shared-bus; nnpl-typed-boundary-ir; openai-symphony; overengineeredlife.
parenting-bookshelf-compass; quiz; silly-pi-stuff; spec-dataset-evolution-corpus; src.
steward; testing-rl; testing-rl-hermes; textual-world-model; tinygrad.
tinygrad-gemma; tinygrad-gemma-gemini; tinygrad-gemma-kimi; unconventional-jepa-lab; word-games.

Public-safe lead candidates

Same-night game/world-model and JEPA bench

gemma-dungeon is the strongest same-night technical lead. Evidence: README, pyproject, specs, schemas, tests, and same-day git activity around explicit symbolic game state, Gemma/tinygrad-facing projections, replay/loss contracts, and model advisory surfaces. Safe summary: embedding-native roguelike/world-model research where symbolic state remains authoritative and model outputs remain auditable. Hold back raw replay/example JSON, generated packs, checkpoints/model artifacts, private corpora, and archived local plans.
jepa-lang is the cleanest new small artifact. Evidence: README, pyproject, docs/spec, source/tests, and a May 18 implementation burst. Safe summary: a deterministic executable IR for language-model-adjacent cognition, with typed replay state, evidence receipts, inert latent slots, and deterministic validation/replay.
jepa-poker is now public-summary eligible with framing. Evidence: README, pyproject, research-goal docs, online learning docs, source/tests/scripts, and May 26 activity around Leduc/Kuhn-style experiments. Safe summary: toy imperfect-information poker as a JEPA-style world-model bench with exact rule engines and legal-action constraints. Do not frame it as gambling automation or real-money tooling.
word-games remains a useful side-room: README, GOAL, pyproject, tests, scripts, and package files describe Story JEPA / character-interiority modeling with frozen text embeddings. Generated runs/checkpoints remain excluded.
unconventional-jepa-lab has strong top-level public-facing docs for ten local JEPA/world-model lanes, gates, and evidence manifests, but hidden local state and secret-bearing filenames make whole-tree publication unsafe. Use only sanitized top-level-doc summaries if needed.

Verifier, formal, and model-runtime benches

testing-rl remains the best verifier/test-generation front-page candidate. Evidence: clean git tree, branch ahead by three commits, README, pyproject, docs/formal material, recent verifier/dashboard/ranking-lift commits, and documented gates. Safe summary: an RL environment for agents or models that write high-value tests while evaluator-held references remain hidden. Hold back benchmark JSON, hidden references/oracles, score payloads, and any unsupported trained-model claims.
testing-rl-hermes is a side-room companion: deterministic test-generation RL/grading with evaluator-owned reference/mutant/oracle material. It has good docs and manifests but remains fixture-heavy.
is-it-formal remains a small public-safe Lean-backed side-room for grading how formal a claim is across domains.
tinygrad-gemma is technically rich and well-documented, but the local tree is dirty and artifact-heavy, with benchmark/log/checkpoint/model/cache/evolution-state surfaces. It should be summarized only at a sanitized model-runtime level.
tinygrad-gemma-kimi is a low-confidence side-room prototype around Gemma 4-on-tinygrad benchmarking/patches; lack of README/manifest and patch/debug artifacts keep it off the front page.

Orchestration, harnesses, and control planes

deer-flow is a clean public front-page candidate as an open-source research/super-agent harness with backend/frontend manifests, Docker/Makefile surfaces, and public documentation. Exclude local config and runtime/demo artifacts.
openai-symphony is a front-page candidate with exclusions: README/SPEC/Elixir docs/manifests support an issue-to-agent-workspace orchestration preview, but hidden Codex/skill surfaces, local runtime material, and generated outputs stay private.
gas-city-but-its-just-codex is a substantial side-room: README, Rust/Swift manifests, schemas, specs, architecture docs, benchmarks, and same-week modified files support a Codex-native durable orchestration/control-plane summary. State/log/benchmark/wiki-source/generated artifacts must remain excluded.
basis and basis-hermes remain coherent spec/provenance side rooms: Basis as structured spec-state custody, and basis-hermes as a Hermes plugin/dashboard wrapper. basis-jcode stays held back because .basis ledgers, packets, prompts, streams, and dashboard/runtime bodies dominate the boundary.
steward is active architecture/provenance work, but dirty/uncommitted service/kernel material and private-corpus/log-ingest/benchmark/config surfaces keep it side-room only.

Craft, interface, humane tools, and upstream substrates

parenting-bookshelf-compass is the cleanest public artifact tonight: a static GitHub Pages quiz with README, index page, 60 questions across 13 categories, 10 cited sources, explicit non-diagnostic disclaimers, clean git status, and no network-send APIs detected. It is not an agent-harness project, but it is public-safe and unusually tidy.
handterm remains the clean craft lead: a clean Rust/Wayland terminal workspace with README, Cargo workspace, CPU/GPU renderers, tests, and recent kitty-graphics/helper refactor history.
llama.cpp and .tinygrad_research are clean public upstream/reference substrates rather than local original project leads.
silly-pi-stuff is a side-room personal prototype: README/package evidence supports a local companion / dashboard / octonion-surface experiment summary, but private: true and hidden prompt/config-like material keep it away from the front page.
kettlebellsim remains a substantial simulation side-room: pyproject, runbooks, plans, tests, and recent bounded simulator/probe-guard work support a high-level summary, while cloud/sim execution details, logs, trajectories, rollouts, and generated media stay private.

Held back from project-specific public detail

The survey fully held back, or reduced to category-only mention, hidden local settings, security/dependency scan artifacts, empty or skeletal directories, one provocative/protected-class-sensitive social-claim notebook, local deployment/model-runner folders, private corpus bodies, prompt/agent/skill instruction bodies, scratch/meta workspaces, generated media, raw logs/prompts/trajectories, evaluator/oracle payloads, raw benchmark outputs, model/checkpoint artifacts, biometric/capture data, creative/canon/world-packet drafts, service configuration, raw test/counterexample bodies, local .env-style material, cache/build/vendor directories, dirty patch/reject variants, and too-skeletal placeholders.

Editorial synthesis

The publishable movement tonight clusters around five claims:

gemma-dungeon, jepa-lang, and jepa-poker make the game/world-model and JEPA research line the most active technical desk.
testing-rl remains the sturdy verifier/test-generation bench, with testing-rl-hermes as a more evaluator-heavy side room.
deer-flow, openai-symphony, gas-city-but-its-just-codex, Basis, and Steward form the orchestration/provenance/control-plane room, but only the public docs/manifests are publishable.
parenting-bookshelf-compass is the tidy public artifact of the night; handterm, kettlebellsim, and selected NNPL projects are useful side rooms.
The tree contains more activity than the public page should accept. The safety filter did not fail; it performed the small civic duty of not turning a filesystem into a confessional.

Agent Harness Wiki

Browse