Spec Deep-Dive: RFC, ADR, and Executable Contracts

Question

How should the corpus model proposal-heavy repositories and executable contract repositories without flattening them into the same class as Markdown product specifications?

Short answer: RFCs, ADRs, and proposals are governance records; OpenAPI, Proto, Smithy, AsyncAPI, GraphQL, Thrift, TLA+, and Dafny are closer to executable or machine-checkable contracts. Both can be “specs,” but they put authority in very different places. A taxonomy that treats all of them as pleasant Markdown-shaped fog will not survive contact with a compiler, a standards body, or Kubernetes.

This page is part of spec-deep-dive-wiki-ingest-project and extends the public frame in spec-dataset-evolution-research-project. It publishes synthesis and metadata only; no private raw corpus bodies are copied here.

Source basis

Claim scopePrivate corpus sourcePublic upstream referenceEvidence fields usedCaveat
RFC/proposal repositories are intentionally spec-primary and encode lifecycle in paths, templates, CI, and history.reports/deep-dives/SPEC-REPO-05/index.md; per-repo JSON/Markdown dossiers; reports/deep-dives/SPEC-REPO-05/sources.mdhttps://github.com/rust-lang/rfcs, https://github.com/kubernetes/enhancements, https://github.com/open-telemetry/opentelemetry-specification, https://github.com/emberjs/rfcs, https://github.com/ethereum/EIPsrepo URL, inspected commit, spec inventory count, spec/code/test LOC, spec-changing commits, same-commit co-change, compliance statusADR discovery was not recall-complete: web search failed and unauthenticated GitHub search hit HTTP 403; backstage/backstage was rejected/deferred as too large for this scout.
Executable contracts need artifact-aware classification rather than documentation heuristics.reports/deep-dives/SPEC-REPO-06/index.md; reports/deep-dives/SPEC-REPO-06/*.json; data/connectedness_features.jsonlhttps://github.com/open-telemetry/opentelemetry-proto, https://github.com/temporalio/api, https://github.com/smithy-lang/smithy, https://github.com/oapi-codegen/oapi-codegen, https://github.com/asyncapi/speccontract file counts, code/spec/test LOC, same_commit_spec_code_count, contract path samples, API/search failures, redistribution statusasyncapi/spec and smithy-lang/smithy are hard-quarantined for raw content by the SPEC-REPO-16 gate because lightweight scans flagged secret-like example material.
GraphQL, Thrift, TLA+, Dafny, and a large OpenAPI generator broaden executable/formal coverage beyond the first contract scout.reports/deep-dives/SPEC-REPO-18/index.md; reports/deep-dives/SPEC-REPO-18/validation_report.md; per-repo JSON/Markdown dossiershttps://github.com/graphql/graphql-spec, https://github.com/apache/thrift, https://github.com/tlaplus/tlaplus, https://github.com/dafny-lang/dafny, https://github.com/OpenAPITools/openapi-generatorclone mode, history coverage, formal/contract file counts, pygount probe, large sparse fallback, license/compliance statusGitHub code search returned HTTP 401; OpenAPI Generator used a blobless sparse fallback with limited first-parent history coverage. Several records are review-required despite public readability.
Non-GitHub public forges host real standards corpora, but discovery and license surfaces differ by host.reports/deep-dives/SPEC-REPO-19/index.md; reports/deep-dives/SPEC-REPO-19/sources.md; per-repo JSON/Markdown dossiershttps://codeberg.org/ariadne/ariadne-identity-specification, https://git.sr.ht/~sircmpwn/hare-specification, https://bitbucket.org/openid/connect, https://gitlab.freedesktop.org/wayland/wayland-protocolshost-specific discovery endpoints, clone URLs, inspected commits, spec inventories, compliance review notes, sensitive-scan counts/classesCodeberg, SourceHut, Bitbucket, and self-hosted GitLab were not treated as if they had GitHub-like global code search. Raw export remains metadata-only or review-required except Wayland’s allowed-with-obligations status.
Raw publication must fail closed.reports/deep-dives/SPEC-REPO-16/COMPLIANCE_EXPORT_GATE.md; reports/deep-dives/SPEC-REPO-16/compliance_export_gate.jsonl rows for SPEC-REPO-05/06Public metadata only; upstream URLs preserved abovederived export policy, license status, redistribution status, secret/PII/internal scan status, reason codesThis page deliberately avoids direct long excerpts. Metadata and synthesis are not raw-content permission.

Cohort map

RFC, ADR, and proposal-heavy governance repositories

SPEC-REPO-05 selected five repositories whose primary artifacts are proposals, RFCs, enhancement proposals, or improvement standards:

RepositoryArtifact familyInspected commitInventory / LOC signalCompliance caveat
rust-lang/rfcslanguage RFCs318c2acc1b22641 candidate specs; 198,549 spec LOC; 6,028 approximate spec-changing commitsApache-2.0; raw export still review-required by the fail-closed gate because PII scan evidence is missing.
kubernetes/enhancementsKEPs / enhancement proposalse2ddea8769ce1,722 candidate specs; 433,941 spec LOC; 33 same-commit spec/code co-change eventsApache-2.0; raw export review-required before release because PII scan evidence is missing.
open-telemetry/opentelemetry-specificationcurrent specification plus OTEP history4034703db495174 candidate specs; 44,601 spec LOC; current specs mixed with proposal history and generation toolingApache-2.0; raw export review-required before release because PII scan evidence is missing.
emberjs/rfcsframework RFC process01909817d773259 candidate specs; 71,191 spec LOC; stage/state movement is a lineage issueMetadata-only: this pass had no normalized license metadata.
ethereum/EIPsEIP/ERC/RIP improvement proposals257357a1447c916 candidate specs; 95,465 spec LOC; 95 same-commit spec/code co-change eventsCC0-1.0, but fail-closed raw export still needs PII-scan review.

The useful distinction is not “Markdown versus code.” It is where authority lives. In proposal-heavy repositories, authority sits in accepted status, numbered files, stage directories, templates, OWNERS/reviewer process, validation scripts, and merge history. Implementations often live outside the repository, so same-repo co-change undercounts coupling by design.

The ADR side remains explicitly incomplete. SPEC-REPO-05 recorded failed web searches and a GitHub rate-limit failure, then used high-signal public seed repositories. That means this page should not be read as an ADR census. It is a first typological slice, not a triumphal parade.

Executable contract repositories

SPEC-REPO-06 and SPEC-REPO-18 cover API schemas, IDLs, formal specs, and large generator/test corpora:

RepositoryContract familyInspected commitEvidence signalRaw/export caveat
open-telemetry/opentelemetry-protoProtocol Buffers telemetry contracts62498ba4f8e213 contract files, 3,236 spec LOC, 29 same-commit spec/code eventsApache-2.0; metadata and synthesis here only.
temporalio/apiProtocol Buffers service API contracts8e0453c3a17664 contract files, 57,382 spec LOC, 64 same-commit spec/code eventsMIT; metadata and synthesis here only.
smithy-lang/smithySmithy IDL/model contracts and protocol docsa8766999c9af1,728 contract files, 152,227 spec LOC, 738 same-commit spec/code eventsHard-quarantined for raw content by SPEC-REPO-16 due flagged secret-like example scan hits.
oapi-codegen/oapi-codegenOpenAPI-driven code generation and regression fixturesc346d1273edb13 selected contract files, 966 spec LOC, 80,747 test LOCApache-2.0; fixture/authoritative-contract separation is required.
asyncapi/specAsyncAPI JSON Schema and normative spec/examplese0078a119c0c25 contract files, 4,255 spec LOC, 4 same-commit spec/code eventsHard-quarantined for raw content by SPEC-REPO-16 due flagged secret-like example scan hits.
graphql/graphql-specGraphQL normative language specification1fe9b61b315112 formal/contract files, 8,994 spec LOCNOASSERTION; review-required.
apache/thriftThrift IDL compiler/runtimec1710f06e13d210 contract/formal files, 17,771 spec LOC, 331 same-commit spec/code eventsApache-2.0; allowed in scout, but this page still publishes metadata only.
tlaplus/tlaplusTLA+ formal specification language/toolchain2097138701701,056 formal/spec files, 114,444 spec LOC, 308 same-commit spec/code eventsMIT but review-required in scout; .tla files mix examples, tests, specs, and tool implementation context.
dafny-lang/dafnyDafny verification language and proof/program corpus6a33d0af6fd72,201 formal/spec files, 217,081 spec LOC, 1,886 same-commit spec/code eventsNOASSERTION; review-required.
OpenAPITools/openapi-generatorlarge OpenAPI generator monorepo600c13a148ce1,530 contract/formal files and sparse-path analysis over a 65k-file treeReview-required; blobless sparse fallback and limited history coverage must stay visible.

Machine-readable contracts invert the ordinary “docs support code” assumption. A .proto, .smithy, OpenAPI file, AsyncAPI schema, GraphQL spec chapter, .thrift, .tla, or .dfy file may be source, test, example, specification, model, and tool input at once. The scanner therefore needs artifact-family fields, not just doc_type = spec.

How these differ from Markdown product specs

Markdown product specs, including LLM-facing specs in llm-readable-spec-files, usually work as durable intent: requirements, non-goals, acceptance criteria, examples, and verification plans. They can be excellent when structured, but their force is mostly social and procedural unless a harness links them to tests, review gates, or code generation.

RFC/proposal repositories are different:

  • They record governance over time: draft, accepted, recommended, released, stagnant, superseded, or migrated states.
  • The lifecycle is often structural: numbered proposal files, stage directories, KEP paths, templates, OWNERS/review process, and validation CI.
  • Their implementation edge is often external. Rust RFCs, Kubernetes KEPs, Ember RFCs, Ethereum EIPs, and OpenTelemetry OTEPs may govern ecosystem code that does not live in the same repository.
  • A filename alone is weak evidence. The same Markdown pattern may be current normative spec, historical proposal, template, rejected idea, or migration note.

Executable contracts are different again:

  • OpenAPI and AsyncAPI describe interface surfaces that tools can validate, render, generate clients/servers from, or use as conformance fixtures.
  • Protocol Buffers and Thrift are IDLs: schema changes carry compatibility and generated-code consequences.
  • Smithy is a service-modeling IDL with traits, protocol definitions, validators, docs, and generated artifacts around a model graph.
  • GraphQL’s repository is mostly normative prose and grammar, but the spec defines a language contract: validation, execution, response, introspection, and grammar semantics are more precise than ordinary product prose.
  • TLA+ and Dafny are not “requirements docs.” They are formal or checkable models and programs. A .tla or .dfy file can be a proof/checking surface, a test fixture, a tutorial artifact, or implementation-adjacent source; classification must preserve that ambiguity instead of sanding it flat.

This is the central modeling rule: artifact class has to be part of the dataset unit. A product spec, a KEP, an EIP, a .proto, a .smithy model, a GraphQL normative chapter, a .thrift file, a .tla model, and a .dfy verified program are all specification-like, but they imply different authority, validation, versioning, and coupling mechanisms.

Non-GitHub standards and protocol corpora

SPEC-REPO-19 added four public-forge repositories that broaden the cohort beyond GitHub:

RepositoryForgeInspected commitSpec signalPublication caveat
ariadne/ariadne-identity-specificationCodeberg / Forgejo20faf8fe62f59 Markdown spec files across core, related specs, and ARCsReview-required because a signature-profile document includes illustrative private-key-like material; no raw examples reproduced.
~sircmpwn/hare-specificationSourceHut91492e0a1b2916 LaTeX language-spec files; build tooling around normative PDF sourceMetadata-only because no conventional top-level license was detected.
openid/connectBitbucketa139823c56bc57 XML/Markdown standards files; core, discovery, registration, session, identity assurance, and federation draftsReview-required because no simple repo-level OSS license was found and public contact metadata is present.
wayland/wayland-protocolsfreedesktop GitLabe193660a389f69 XML protocol/spec files; stable, staging, experimental, unstable/deprecated phase semanticsAllowed with obligations under MIT/X.org-style license; notice obligations remain.

The lesson is operationally prosaic and therefore important: discovery is host-shaped. Codeberg repo search, SourceHut HTML/project pages, Bitbucket workspace APIs, and instance-local GitLab search are not interchangeable with GitHub code search. The corpus should preserve host, endpoint, raw retrieval status, and rejected/fork-heavy candidate evidence before making coverage claims.

Compliance and public-safety gate

The public-safe boundary for this page is synthesis plus metadata:

Gate classRecords affected herePublic wiki treatment
hard_quarantine_raw_contentSPEC-REPO-06:asyncapi__spec, SPEC-REPO-06:smithy-lang__smithyNo raw text excerpts. Only repo URL, commit, counts, paths, and paraphrased interpretation.
metadata_onlyemberjs/rfcs, ~sircmpwn/hare-specification, plus any missing/NOASSERTION or metadata-only rows under the gateMetadata and synthesis only; no raw corpus bodies.
review_required / review_required_before_raw_exportGraphQL, Dafny, TLA+, OpenAPI Generator, Ariadne, OpenID, and several permissively licensed proposal repos under missing scan evidenceTreat public readability as insufficient for raw publication. Human/legal/security review is a separate gate.
allowed_with_obligationswayland/wayland-protocolsPreserve license/notice obligations; this page still chooses metadata-only publication.

The hard-quarantine caveat is not decoration. In the source gate, hard quarantine means raw content export is blocked even when the upstream repository is public and permissively licensed, because scan evidence found secret-like example material. Public examples can still be unsafe to launder into a derived corpus if the scanner/adjudication contract has not closed.

Negative evidence and uncertainty

  • SPEC-REPO-05 preserved failed Firecrawl/web-search attempts and GitHub HTTP 403 rate-limit failures for ADR/RFC discovery. ADR coverage is therefore a documented gap, not a census.
  • SPEC-REPO-06 preserved GitHub code-search and API HTTP 403 failures; selected executable-contract repos were curated from known public seeds and local clone inspection.
  • SPEC-REPO-18 preserved GitHub code-search HTTP 401 and pygount availability context; OpenAPI Generator’s sparse clone means path-level evidence is not the same as full blob/history mining.
  • SPEC-REPO-19 preserved host-specific search limitations and rejected/deferred candidates; no non-GitHub host was forced into a GitHub-shaped schema.
  • Current-star, issue, and co-change signals are association evidence, not causal pressure timelines. The more honest sentence is “associated with,” not “caused by.” Tiny wording choices are where datasets either grow bones or become jam.

Dataset schema implications

For the next corpus schema pass, these cohorts argue for separate fields for:

  1. artifact_family: product_spec, rfc, adr, improvement_proposal, openapi, asyncapi, protobuf, smithy, graphql_spec, thrift_idl, tla_model, dafny_program_or_proof, standards_xml, latex_language_spec.
  2. authority_origin: standards body, language governance, project governance, source-of-truth IDL, generated fixture, conformance test, tutorial/example, or external implementation edge.
  3. validation_surface: lint, schema validation, compiler/codegen, model checker, verifier, conformance tests, CI, or human review.
  4. implementation_coupling_scope: same repo, sibling repo, ecosystem-wide, standards body, or unknown.
  5. raw_export_policy: derived fail-closed status separate from metadata export.
  6. clone_and_history_coverage: full, blobless, sparse, first-parent-limited, rate-limited, or host-specific retrieval.

Without those fields, an RFC and a .proto file both become “spec-like text,” which is true in the same way that a theorem and a Post-it note both contain symbols. The distinction rather matters.

Deep-dive navigation