OGI

World Models

The world-models surface predicts future latent states given current latent states and proposed actions. It is the surface that lets the system imagine outcomes without acting in the world.

Role

A world model serves three functions:

  1. Planning. The reasoning surface uses world-model rollouts to score proposed plans.
  2. Sample efficiency. Embodied learning bootstraps from imagined trajectories before consuming real-world data.
  3. Anomaly detection. When observed outcomes diverge from predicted outcomes, the divergence is a useful signal.

Architecture

The world-model surface is a latent-space transformer trained to autoregressively predict the next chunk of latent given the previous chunk and an action token.

PropertyValue
Parameters1.4B
Context32k latent tokens
Prediction horizon16 chunks ahead (~64k tokens)
OutputNext-chunk latent + uncertainty estimate

Crucially, the world model operates in the latent space, not in pixel or signal space. Pixel-level prediction is wasteful for downstream use; the latent already contains the abstractions other surfaces will consume.

Training

The world model is trained on two streams:

  • Passive video. Public web video and validator-contributed embodied video provide a vast supply of action-free temporal data. The world model treats the absent action as a learned null token.
  • Active rollouts. When the embodiment surface executes an action, the resulting transition (s, a, s') in latent space is logged and used as a paired training example.

The second stream is the unique advantage of a distributed embodied network: every validator running embodiment produces world-model training data as a byproduct.

Uncertainty

A world model that does not know what it does not know is useless for planning. The surface emits per-prediction uncertainty in two forms:

Epistemic. Disagreement among an ensemble of forward passes with different dropout masks. High epistemic uncertainty indicates the model has not seen the situation enough.

Aleatoric. Predicted variance of the next state. High aleatoric uncertainty indicates the outcome is intrinsically stochastic.

Reasoning conditions on these uncertainties. High-epistemic regions trigger active information gathering; high-aleatoric regions trigger risk-aware planning.

Rollout protocols

Two rollout modes are exposed to callers:

  1. Open-loop. Given a sequence of actions, predict the trajectory. Used for plan scoring.
  2. Branching. Given a state, expand the top-k actions per step and return a tree of predicted trajectories. Used by MCTS in the reasoning surface.

Both modes terminate when:

  • A terminal state is reached (goal achieved or constraint violated), or
  • The horizon is exhausted, or
  • Uncertainty exceeds a configured ceiling (the model refuses to predict).

The third condition is critical. A world model that confidently extrapolates beyond its training distribution is more dangerous than no world model.

Verification

World model predictions are verified post-hoc: when the embodiment surface executes an action, the predicted next state is compared to the observed next state. Persistent divergence in a region of state space is a signal for retraining priority.

This verification is one of the more elegant features of the network: every embodied action produces a labeled training example for the world model with no additional infrastructure.

What world models do not do

They do not select actions. They are pure forward simulators conditional on an action input. Action selection happens in the reasoning surface, which uses the world model as one of several inputs.

They do not handle long horizons reliably. The 16-chunk horizon is a hard limit; predictions beyond it should be treated as illustrative, not actionable. Long-horizon planning is accomplished by composition: the reasoning surface plans hierarchically, with the world model rolling out the leaves.

Coupling

The world model is the most tightly coupled of the cognitive surfaces. It depends on:

  • Vision for grounded state input.
  • Reasoning as its primary consumer.
  • Embodiment for action conditioning and ground-truth feedback.
  • Memory for retrieval of similar past trajectories.

A world model evaluated in isolation is misleading. Its quality is the quality of the closed loop: prediction → plan → action → observation → update.