pull down to refresh

This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the best use them in practical applications.
We cover all the major stages:
  1. pretraining: data, tokenization, Transformer neural network I/O and internals, inference, GPT-2 training example, Llama 3.1 base inference examples
  2. supervised finetuning: conversations data, "LLM Psychology": hallucinations, tool use, knowledge/working memory, knowledge of self, models need tokens to think, spelling, jagged intelligence
  3. reinforcement learning: practice makes perfect, DeepSeek-R1, AlphaGo, RLHF.
This is from Karpathy who was on OpenAI’s founding team and is well known for producing super accessible primers on AI stuff.
This is a long one. Bookmarking for later.
reply