Language

The language surface is responsible for encoding and producing token streams in natural human languages and structured formal languages (code, math, query languages, formal logic). It is the most studied surface in the field and the most thoroughly specified here.

Architecture

The language surface is a decoder-only transformer with the following profile:

Property	Value
Parameters (active)	7.0B dense + 32B sparse experts
Layers	48
Attention heads	32
Hidden dimension	4,096 (matches the shared latent)
Context window	131,072 tokens
Vocabulary	256,000 BPE tokens
Position encoding	RoPE with NTK scaling
FFN	SwiGLU; sparse mixture-of-experts in odd layers

The sparse-expert path is activated only when the latent indicates a domain shift requiring it; in the steady state, the dense path handles approximately 78% of forward passes. Routing is learned and observable; routing weights are part of the published checkpoint.

Tokenization

A single tokenizer covers natural language, code, and protocol-level formal sublanguages (instruction syntax, embodiment specifications, math). The tokenizer is byte-fallback; arbitrary bytes can be encoded without loss. Reserved tokens delimit modality boundaries within the shared latent (<vision>, <motor>, <plan>, <memory>).

Input pathways

The language surface receives input from:

Direct text. User-supplied or document-derived.
Vision captions. Emitted by the vision surface and tokenized back into language.
Memory recall. Surfaced from long-term memory by the memory surface.
Plans. Structured outputs from the reasoning surface, rendered as tokens.

The surface does not distinguish these origins at the architectural level; they are differentiated only by the special tokens that prefix them.

Output pathways

The surface emits to:

User output. Detokenized to UTF-8.
Reasoning surface. As intermediate steps in chain-of-thought.
Embodiment surface. As high-level instructions to be grounded into motor plans.
Memory surface. As candidates for long-term retention.

Training signals

The surface is trained against four objectives, mixed during pretraining and rebalanced during fine-tuning:

Next-token prediction on the unfiltered web crawl, deduplicated and license-filtered.
Span infilling on the code subset and on the structured subset.
Instruction following on a curated mixture of public and validator-contributed instruction data.
Direct preference optimization on human and model-judged comparisons.

The pretraining corpus and the fine-tuning mixtures are specified in Pretraining.

Latency budget

The language surface declares the following latency tiers:

Tier	TTFT	Throughput	Routed to
Interactive	< 250ms	> 80 tok/s	Default user queries
Bulk	< 5s	> 200 tok/s	Long-context summarization, batch jobs
Background	< 60s	best-effort	Memory consolidation, model self-talk

Validators advertise the tier they serve. Routing across tiers is automatic; users do not select.

Verification

Language outputs are verified by the network through three mechanisms:

Deterministic re-execution. A sampled subset of queries is re-run on a second validator with the same seed; outputs must match to within an exact-match threshold.
Eval-suite sampling. A held-out eval suite is periodically run against the active checkpoint. Performance regressions trigger automatic rollback.
Cross-surface consistency. Outputs are checked against the latent state they were produced from; outputs incompatible with the surrounding latent are penalized.

See Validators for the economic mechanics of these checks.

What this surface does not do

It does not reason. Multi-step problem solving lives in the reasoning surface, which the language surface can call. It does not remember. Persistent state across sessions lives in the memory surface. It does not act on the world. Physical action lives in the embodiment surface.

A language model that performs all four functions is, in the codex's terminology, an undocumented monolith. The codex deliberately decomposes.