pull down to refresh

Naomi Saphra thinks that most research into language models focuses too much on the finished product. She’s mining the history of their training for insights into why these systems work the way they do.
These days, large language models such as ChatGPT are omnipresent. Yet their inner workings remain deeply mysterious. To Naomi Saphra(opens a new tab), that’s an unsatisfying state of affairs. “We don’t know what makes a language model tick,” she said. “If we have these models everywhere, we should understand what they’re doing.”
Saphra, a research fellow at Harvard University’s Kempner Institute who will start a faculty job at Boston University in 2026, has worked for over a decade in the growing field of interpretability, in which researchers poke around inside language models to uncover the mechanisms that make them work. While many of her fellow interpretability researchers draw inspiration from neuroscience, Saphra favors a different analogy. Interpretability, in her view, should take a cue from evolutionary biology.
“There’s this very famous quote by [the geneticist Theodosius] Dobzhansky: ‘Nothing makes sense in biology except in the light of evolution,’” she said. “Nothing makes sense in AI except in the light of stochastic gradient descent,” a classic algorithm that plays a central role in the training process through which large language models learn to generate coherent text.