Continual Learning
Continual learning is the system's capacity to acquire new competences without forgetting prior ones. The codex's general intelligence cannot be a single training run; it must be a continuous process over the deployed network.
The problem
A network trained on task then fine-tuned on task typically exhibits catastrophic forgetting: performance on degrades sharply. Formally, if is the unconstrained optimum on , then
in general. Continual learning is the family of techniques that prevent this.
Three regimes
The codex distinguishes three continual-learning regimes:
| Regime | What changes | Cadence | Example |
|---|---|---|---|
| Task addition | New task, existing modalities | weekly | New benchmark passes added |
| Modality addition | New modality | monthly | New sensor or actuator type |
| Distribution shift | Underlying data distribution | continuous | Web crawl drift, embodied data accumulation |
Each regime is handled by different machinery.
Elastic weight consolidation
For task addition, the codex applies elastic weight consolidation with the loss
where is the diagonal of the Fisher information matrix at and is the parameter value after task . Parameters important for (high ) are penalized for moving; unimportant parameters are free.
The Fisher information is estimated empirically over a small replay buffer of 's training data. The penalty coefficient is tuned per task pair.
Experience replay
For distribution shift, the dominant mechanism is replay. The training mixture at any update includes a fraction of historical data drawn from a reservoir-sampled buffer. The mixture ratio is typically 90% new / 10% historical, though it varies by surface.
The replay buffer is sharded across validators: each validator holds a slice, and updates draw from a federated sample of the global buffer.
Adapter-based addition
For modality addition, new modalities are introduced as adapters rather than as full fine-tunes. An adapter is a low-rank update
trained while the base is frozen. The new modality's encoder + adapter is fully trainable; the rest of the network is read-only.
After eval-suite verification, adapters can be merged into the base or retained as separate modules. Most adapters remain separate; merging is reserved for adapters that benefit a sufficiently broad set of downstream tasks.
The forgetting eval
Continual learning is verified through a forgetting eval: a regression suite of historical tasks run before and after every continual-learning update. An update that regresses any historical task by more than a configured threshold is rejected by the protocol's eval gate (see Validators). The threshold is per-task and is set at training time.
Versioning
Continual learning produces a long chain of checkpoints. The codex uses a directed-acyclic-graph version model: each checkpoint references its parent(s) and the training data that produced it. Forks are permitted (a checkpoint can branch for an experimental update); merges are constrained (two diverged checkpoints can only be merged by training a new checkpoint on the union of their data).
Version state is on-chain. The active routing table points to a single checkpoint per surface, updated by governance.
Why this is the codex's hardest problem
Continual learning is the surface most likely to fail silently. Forgetting can be subtle; an eval suite cannot cover every regression. The codex's response is to (1) maintain a deliberately broad eval suite, (2) require validator-attested replays of historical tasks, and (3) keep the version chain long enough that any regression can be traced and reverted.