Introduction
Everyone has opinions about AI; almost nobody knows how it works. People use ChatGPT, Claude, and Gemini every day for coding, writing, research, and brainstorming, yet most of them couldn’t tell you what a token is. They don’t know why “think step by step” helps, why the model invents fake citations, or why the same prompt gives different answers each time. They’re wielding the most powerful tool they’ve ever touched with the understanding of someone who thinks their microwave runs on tiny elves.
This series exists to fix that, not by turning you into an ML researcher or walking you through backpropagation from scratch, but by building the mental models you need to understand what happens when you talk to an LLM so you can use it better, evaluate it honestly, and form opinions that aren’t vibes.
Important (The thesis)
Every technical concept in this series maps to a practical implication. The architecture isn’t trivia; it’s the reason your prompt works or doesn’t, the reason the model hallucinates, the reason “just make it bigger” keeps working. When you understand how the model sees your input, you write better inputs.
Who this is for
Smart people who use LLMs but feel like they’re cargo-culting. You’ve probably read a “how transformers work” blog post or two, maybe watched a 3Blue1Brown video, and came away with a vague sense that attention is important and matrices are involved without being able to explain it to a friend in a way that clicks. This series is a mental model, not a paper or a tutorial; the goal is for you to finish it understanding why things work the way they do, not just that they work.
What we’ll cover
-
From Words to Vectors:a speedrun through NLP history, then what actually matters: tokenization, embeddings, and positional encoding.
-
The Transformer:self-attention, multi-head attention, the full transformer block, and next-token prediction. The core machinery that most people get wrong or hand-wave past.
-
How Models Are Trained:pretraining, supervised fine-tuning, RLHF, and the alignment tax.
-
Why It Keeps Getting Better:scaling laws, Mixture of Experts, emergent abilities, and inference-time compute.
-
Practical LLM Literacy:why prompting techniques work (mechanistically), hallucinations as an architectural inevitability, RAG, agents, and when to fine-tune vs prompt vs retrieve.
-
The State of AI in 2026:who builds what, what’s overrated, what’s underrated, and where this is going.
Tip (How to read this)
If you’re technical (STEM background, comfortable with math), read straight through; the equations are there for you. If you’re not, every equation is preceded by a plain-English intuition in a callout box. Read those, skip the math, and you’ll still come away with a solid understanding. Either way, read Parts 1 through 5 in order the first time. Part 6 (landscape) you can jump to whenever.
Prerequisites
You need to know what a matrix is (a grid of numbers) and what a function is (input goes in, output comes out). No ML background, no linear algebra fluency, no CS degree. If you got through high school math, you’re fine; everything else gets built up from scratch.