Spec Deep-Dive: jcode
Question
Why should 1jehuang/jcode be treated as a priority case study for the
spec-dataset-evolution-research-project rather than as just another Markdown-heavy
repository?
Short answer
jcode is a post-LLM coding-agent harness whose specification surface is
not one canonical spec.md, but a distributed system of architecture notes,
performance plans, audits, protocols, agent instructions, behavior prompts,
safety policies, and product-surface designs. It is therefore a calibration case
for the difference between llm-readable-spec-files as a clean ideal and real
AI-era repos whose contracts are scattered through the workspace like well-meaning
sparrows.
The public-safe conclusion is: include 1jehuang/jcode as a priority calibration
seed for distributed spec surfaces and high code/spec connectedness, but
keep raw-content export blocked pending policy review.
Source basis
| Claim scope | Private corpus source | Public upstream reference | Evidence fields used | Caveat |
|---|---|---|---|---|
| Identity and classification | reports/deep-dives/jcode.md; reports/jcode_first_calibration_seed.md | https://github.com/1jehuang/jcode | repo URL, repository description, topics, primary language, created/pushed dates, observed default branch | The private dossier was used as evidence; no raw corpus file body is reproduced here. |
| Current inspected revision | reports/deep-dives/jcode.md; reports/deep-dives/jcode-analysis.json | commit 36b46fcd32d724bc3eaaabaf88e79e23e21980c2 | local clone metadata, HEAD subject, commit count, history span | This is a snapshot at corpus HEAD 4659608; current GitHub state may have moved. |
| Distributed spec surface | reports/deep-dives/jcode.md; reports/jcode_first_calibration_seed.md | paths such as AGENTS.md, src/prompt/system_prompt.md, docs/COMPILE_PERFORMANCE_PLAN.md, docs/MODULAR_ARCHITECTURE_RFC.md | selected spec-like paths, path classes, LOC, touch counts, artifact labels | README is mixed product/marketing/spec material; it must be section-classified rather than counted wholesale. |
| Code/spec connectedness | reports/deep-dives/jcode.md; reports/deep-dives/jcode-analysis.json | same inspected repo and commit | spec-like commit count, doc/code co-change examples, path/command references, prompt/instruction files | Co-change is evidence of connection, not proof of causal governance. |
| AI-era timing and pressure | reports/deep-dives/jcode.md; reports/jcode_first_calibration_seed.md | repo created 2026-01-05; release v0.11.10 observed 2026-05-04 | first commit date, release density, pressure-category counts, topics | Post-LLM timing is a repo-era/product-domain label, not evidence that any specific file was AI-generated. |
Why this case matters
The dataset needs examples that falsify simplistic miners. jcode does this in
both directions:
- an exact-
spec.mdminer would miss almost everything important; - an all-Markdown miner would inflate the corpus with product prose, screenshots, demos, operations notes, and marketing material;
- a better miner has to classify artifact families: architecture, RFC, plan, audit, protocol, prompt, safety policy, agent operating instructions, and mixed README sections.
That makes jcode an anchor for the project’s real question: which spec-like
artifacts actually constrain code and agent behavior, and which are just decorative
context with nicer typography?
Distributed spec surface
The jcode calibration seed gives these high-priority classes:
| Artifact class | Example paths | Dataset treatment |
|---|---|---|
| Agent operating instructions | AGENTS.md; src/prompt/system_prompt.md | Include as normative behavior contracts, because they address future agents/developers and the harness itself. |
| Performance/resource plan | docs/COMPILE_PERFORMANCE_PLAN.md; docs/MEMORY_BUDGET.md; docs/TERMINAL_BENCH.md | Include when commands, budgets, thresholds, or benchmark gates are explicit. |
| Runtime architecture | docs/SERVER_ARCHITECTURE.md; docs/MEMORY_ARCHITECTURE.md; desktop/iOS architecture notes | Include when module boundaries, state ownership, lifecycle invariants, or process topology are named. |
| RFC / migration plan | docs/MODULAR_ARCHITECTURE_RFC.md; split/server-service plans | Include as proposed/refactor contracts with status and validation fields when available. |
| Contract audit | docs/PROVIDER_SESSION_SHARED_CONTRACT_AUDIT.md; code/mobile audits | Include as measurement/review artifacts, distinct from prospective design. |
| Protocol and safety policy | docs/BROWSER_PROVIDER_PROTOCOL.md; TELEMETRY.md; docs/SAFETY_SYSTEM.md; docs/AMBIENT_MODE.md | Include when the text constrains external integration, event shape, permissions, reversibility, or autonomy. |
| Mixed README/product surface | README.md; desktop/mobile/figma docs | Include only with section-level or manual classification. |
This is the same authority-layer problem that appears in context-engineering: models and humans both need to know which text is contract, rationale, policy, example, or ambient color.
Connectedness evidence
The corpus dossier records 39 selected spec-like files, 14,209 spec-like LOC,
348,326 Rust LOC, and 2,953 commits over 120 observed days. It also records:
| Metric | Value |
|---|---|
| Selected spec-like commits | 298 |
| Fraction of all commits touching selected spec-like paths | 0.101 |
| Spec-like commits also touching code | 130 |
| Spec-like commits also touching tests | 26 |
| Spec/code co-change fraction among spec-like commits | 0.436 |
Representative anchors from the dossier include architecture or telemetry docs
co-changing with CLI/server/TUI code, a desktop prototype document co-changing
with crates/jcode-desktop/src/main.rs, and multi-session architecture notes
co-changing with TUI workspace-map code.
The interpretation should stay modest: this is strong connectedness evidence, not a theorem of design causality. The point is to make a measurable surface for evaluation-and-review-loops, not to make the spreadsheet feel heroic.
AI-era timing and pressure
jcode begins in January 2026, after the public LLM coding-agent wave, and its
observed topics and architecture include AI, Claude/OpenAI provider surfaces,
MCP, browser automation, swarm/agent coordination, memory, ambient mode, and
self-development. The dataset label should therefore be post_llm_agent_wave or
ai_native_agent_harness.
Pressure labels from the dossier include TUI/UX, fix/bug, multi-session/server, provider/auth, performance/resource, agent coordination, testing/safety, platform/product-surface, documentation/spec, and refactor/architecture pressure. These are descriptive commit-message classes. They should not be laundered into a historical popularity or causality timeline.
Limitations and publication boundary
- Raw content export remains
review_required; this page uses synthesis and metadata only. - The local dossier observed release/tag data and Cargo package version mismatch;
release derivation should use tags/releases rather than
Cargo.tomlalone. - Current stars/forks are current-snapshot evidence, not historical pressure.
- Generated assets, demo JSON, package manifests, CI files, screenshots, videos, and fixtures are negative/control families unless a separate policy promotes them as executable contracts.
README.mdis useful but dangerous: classify it by section, not by aura.
Dataset implication
Use jcode to calibrate the high-connectedness end of the corpus. The crawler
should preserve exact path classes, same-commit co-change, path and command
references, prompt/instruction files, status fields, and adjacent negative
controls. This page is one public-safe slice of spec-deep-dive-wiki-ingest-project;
the raw dossier stays in the private corpus.
Deep-dive navigation
- Aggregate index: spec-deep-dive-index
- Priority cases: spec-deep-dive-case-jcode, spec-deep-dive-case-droidagent, spec-deep-dive-case-j8-ambiguity
- Cohort pages: spec-deep-dive-cohort-exact-spec-md-and-standards, spec-deep-dive-cohort-agent-native-spec-kit-kiro, spec-deep-dive-cohort-rfc-adr-executable-contracts