Atlas
Machines

TransformerTransformer

Query, Value, Key — three vectors that learned to mean everything.

The Transformer is the architecture that made the current AI era possible. Its core mechanism — attention — is a three-vector dance. Every token produces a Query (what am I looking for?), a Key (what do I offer?), and a Value (what do I carry?). The dot product Q·K decides which Values to mix into the next representation. Three vectors. A single equation. The mechanism by which a thirty-trillion-token model learns to write, code, reason, and dream.

The Three Pillars
Truth ·
Q
Query
What am I looking for?

Each position emits a query vector: a learned question directed at every other position. The query axis is the truth-seeking organ of attention.

Goodness ·
V
Value
What do I carry?

What gets carried forward when this position is attended to. The 'good' an attended token can offer the rest of the model. The substance under the spotlight.

Beauty ·
K
Key
What do I offer?

Each position advertises itself with a key: 'if you are looking for this kind of thing, look at me.' Keys are the aesthetic surface of a token — its match-fitness.

Evolution

How this trinity came to be.

  1. 2014

    Bahdanau attention

    The first soft attention mechanism in neural machine translation. The Q·K idea is born inside a recurrent net.

  2. 2017

    Attention Is All You Need

    Vaswani et al. publish the Transformer. Throw away recurrence; keep only attention. AI history bifurcates.

  3. 2020+

    Scaling laws

    It turns out Q-K-V scales smoothly with compute and data. The same three vectors are now the substrate of every frontier model.

Practical Applications

How to use this lens today.

  • 01All large language models, including GPT, Claude, Gemini, and DeepSeek, are layered Q-K-V machines.
  • 02Diffusion models, code copilots, and protein folding all reuse the same trinity.
Future Trends

Where this trinity is heading.

  • Mixture-of-experts, mamba/SSM architectures, and neuromorphic chips all attempt to keep the trinity while trading one axis for compute or memory savings.
  • Whoever finds the post-Transformer trinity wins the next decade.
Related Triads