The truth of GPT is the model body itself: layers of attention and feed-forward blocks. The architecture is what allows arbitrary patterns to be expressible at all.
GPTGPT
Generative. Pre-trained. Transformer. Three letters that rewrote a decade.
GPT is the three-stage assembly line that built the era. First the Transformer — an architecture that can mix any context into any output. Then pre-training — staring at the world's text long enough to absorb its statistical shape. Then fine-tuning — being polished into a useful, aligned assistant by a much smaller, carefully chosen second pass. Skip any of the three and you do not get the system everyone is using. Together they are AI's industrial trinity.
Months of compute predicting the next token across the open web, books, code, conversation. Pre-training is the goodness axis because it gives the model a vast common-sense substrate to act in.
Hundreds of millions of carefully labeled examples — RLHF, RLAIF, DPO — turn a general statistical machine into something useful, polite, and aesthetically aligned with humans.
How this trinity came to be.
- 2018
GPT-1
OpenAI ships a 117M-parameter Transformer pre-trained on books. The pattern is announced.
- 2020
GPT-3
Scale meets emergence. 175B parameters. The world realizes pre-training is the binding axis.
- 2022
ChatGPT
Fine-tuning makes the system humans actually want to talk to. The third axis closes the loop.
How to use this lens today.
- 01Every modern frontier model — Claude, Gemini, Grok, DeepSeek — uses the same three-stage assembly. The competition is on the margins of each.
- 02Domain models (medicine, law, code) reuse stages 1 and 2 and replace stage 3.
Where this trinity is heading.
- →Continual learning will erase the line between pre-training and fine-tuning. The trinity collapses into a continuous flow.
- →Personal pre-training — models that ingest the lifetime of a single person — becomes a new product category.