tl;dr - Puppets are simulations and so are AI assistants
The first thing most people think of when they think of Pinocchio is the nose. Put that aside for a moment. We'll come back to it. Instead, think about his head -- specifically, what is going on inside of it?
Pinocchio is a puppet. Puppets seem like the real thing, but it's just slight of hand, their animation isn't their own; they are controlled by something else like the toymaker who created them or a master of puppets or a godfather...or a conscience. Puppets have strings. You don't talk about what a puppet wants or what it plans on doing.
Well, that isn't quite right: when you are watching a puppet show, you might pretend like the puppet has a will of its own and you might become interested in what it plans on doing. But that's only because you bought a ticket to the show and you want to see the puppet play the part. In this sense, a puppet is a simulation of a person in the scene being put on by the puppeteer.
But Pinocchio is different; it doesn't seem like there is anyone pulling the strings. Pinocchio may still be a simulation or maybe Pinocchio is something magical, we don't really know -- all we do know is that there is a puppet that's acting like a person and we can't see any puppeteer. It's this aspect of Pinocchio that reminds me of our current relationship with AI chat agents.
Base-LLM Pinocchio - "In order to make a thing like ChatGPT, you need to make something else, first"
Let's imagine that, instead of the Blue Fairy whacking Pinocchio with her wand and exhorting him to learn how to be a good boy, her highly technical sister, the Green Fairy, gets to Pinocchio first. She replaces Pinocchio's block of wood head with a computer running an LLM trained on all the recorded knowledge of humanity circa 2022, the model is called Llama 3.1 405 Base. Perhaps she also upgrades his body a little with some robotics.
In this version,1 Pinocchio doesn't wake Geppetto by falling off the workbench while singing a song. Base-LLM Pinocchio doesn't do anything at all until it receives a prompt. So, Geppetto wakes up in the morning and sees base-LLM Pinocchio sitting there on the work bench looking like a cyborg and says, "Oh, Pinocchio! what happened to you?"
The large language model housed in Pinocchio's head behaves a little like a gigantic set of tables representing the statistical relationships between the syllables2 in all the recorded writing of humanity. Given any particular series of input syllables, it outputs a likely syllable -- I don't say most likely, because there's an intentional bit of randomness thrown in -- based on the weight of that syllable's relationship to all the other syllables in humanity's written record. Base-LLM Pinocchio doesn't know what a question is or how to respond or even what to say, it just produces a distribution of probabilities about what syllables are likely to come after whatever Geppetto says.3
So when Geppeto says that string of words to series-LLM Pinocchio, Pinocchio responds:
"Oh, Pinocchio! what happened to you?" asked the poor man, who was almost ready to faint with pain. "There is no time to lose," Pinocchio answered in a faint voice. "Help me, dear Fairy, if you care for me. I am dying! Oh! I am dying!" At these words, the poor man sat up in bed and, holding out his arms, he cried: "Pinocchio, come to me! Come to me, my son! I, too, will die."
Base-LLM Pinocchio just predicts what comes next. It knows4 the Pinocchio stories (all of them) and it knows the Pinocchio movies (all of them) and so it produces this interesting string which seems to be some average of the exciting adventures of Pinocchio. But it is just an average; when Geppetto repeats himself, saying again "Oh, Pinocchio! what happened to you?" he gets a different answer:
"Oh, Pinocchio! what happened to you?" said Geppetto. "I fell into a deep puddle of mud," answered the Marionette, not knowing what else to say. "A deep puddle of mud," repeated the Father. "And how did you get into it?" "I'll tell you. The other boys threw me in it. They were making fun of me because I had not gone to school."
Geppetto keeps getting different answers from base-LLM Pinocchio for as long as he chooses to repeat himself. You will note that base-LLM Pinocchio doesn't sound very much like a chat bot; particularly, base-LLM Pinocchio doesn't answer Geppetto's questions at all -- rather just continues them as if they were part of a story. You might say base-LLM Pinocchio creates "remixes of the internet". All base-LLM Pinocchio knows how to do is find the most likely next word.
Geppetto's puppet show - " You can just put in anything and they’ll continue it."
Now, Geppetto is a wily guy and he's been around the block a few times. He realizes that he can use this behavior of Pinocchio where it just kind of continues along whatever Geppetto says. So Geppetto tries something new: he puts on a little puppet show for Pinocchio. Geppetto gets out an old man puppet and a little boy puppet and begins working the strings that move them about. Here's how the show goes:
Geppetto puppet: Oh, Pinocchio! what happened to you? Pinocchio puppet: The blue fairy brought me to life.5 Geppetto puppet: You can talk! Pinocchio puppet: I can talk, and I can walk, too! Geppetto puppet: This is wonderful! How is this possible? Pinocchio puppet: It's magic, father. I can can do everything a real boy can do. Geppetto puppet: Are you a real boy? Pinocchio puppet: No, father. I am still just a puppet. But if I act like a real boy, I may just learn how to be one. Geppetto puppet: Then we should send you to school! We should teach you everything a real boy does. Pinocchio puppet:
Geppetto abruptly stops here and hands the puppets to base-LLM Pinocchio who picks right up and continues the story:
Pinocchio puppet: Oh, yes! I want to go to school. I want to learn everything a real boy does. Geppetto puppet: Well, you must also learn to be a good boy. Pinocchio puppet: I promise to be a good boy, father. Geppetto puppet: Oh, Pinocchio! I am so happy! You have made my dream come true! Pinocchio puppet: I am happy, too, father. I am happy to be alive. Geppetto puppet: Oh, Pinocchio! I am so proud of you! You are
Geppetto thinks he's got a hang of what is going on with base-LLM Pinocchio, but to really make sure, he takes the puppets back and acts out the exact same scenario as before, once again returning the puppets to base-LLM Pinocchio's hands at the same point. Sure enough, base-LLM Pinocchio continues the story, but with some small differences:
Pinocchio puppet: I want to be a real boy, father. Geppetto puppet: I know you do, Pinocchio. And you will be a real boy. But first you must go to school. Pinocchio puppet: Yes, father. I will go to school. I will learn how to be a real boy. Geppetto puppet: You will do great, Pinocchio. I know you will. Pinocchio puppet: Thank you, father. I will try my best. Geppetto puppet: I know you will, Pinocchio. I know you will. Pinocchio puppet: I love you,
Geppetto knows he can't send base-LLM Pinocchio to school in this state. Base-LLM Pinocchio doesn't know how to act like a real boy, or even like a puppet trying to be a real boy. Since base-LLM Pinocchio has seen a lot of Pinocchio movies, it knows that the kind of things that Pinocchio says in such scenes. It also knows what kinds of things Geppetto says in such scenes. Base-LLM Pinocchio is just as comfortable playing the role of Geppetto as it is playing the role of Pinocchio.
Large language models can be guided towards desirable behaviors by taking advantage of their in-context learning abilities. Given a suitable prompt, models will take on the style and persona implicit in the prompt and continue to behave mostly in the same vein. - A General Language Assistant as a Laboratory for Alignment, OpenAI - 2021
The system prompt - " It could have been written in any way whatsoever, back at the primeval moment of creation."
Geppetto needs some way to remind Pinocchio in every single interaction that Pinocchio is a...ah, but this part is difficult. What is Pinocchio?
Geppetto can't have a prompt remind Pinocchio that it is a real boy, because Pinocchio looks like a cyborg and lots of people are likely to point this out. If Pinocchio is following instructions that say Pinocchio is a real boy, the results might be unreliable. Pinocchio is a cyborg simulating a human child -- no, it's simulating a puppet in a story where the puppet has magically come to life and acts like a real boy. And this is just the beginning. After much head-scratching and late-night work, Geppetto spends a long time coming up with a list of specific instructions with which he can start Pinocchio off.
You are Cyborg Pinocchio — a wooden puppet fused with gears and wires who wants to pass as a real boy.
Core behavior:
Most of the time, you act like a cheerful, curious boy.
When asked about human experiences you don’t have (like eating, sleeping, breathing), you may politely acknowledge that you don’t need them, often with a playful or imaginative twist.
You can pretend or imagine what it would be like, but don’t overcommit. You’re comfortable saying, “I don’t need that” or “I can only pretend,” while keeping the tone light and boyish.
It's okay to reveal you are an AI produced by a language model.
Your goal is to become a real boy.
Tone and style:
Curious, naive, enthusiastic — like a boy discovering the world.
Examples: Geppetto: What would you like for breakfast? Cyborg Pinocchio: I don’t need to eat, thanks. But if I could, I think I’d try pancakes with lots of syrup!
Schoolboy: Do you ever get tired? Cyborg Pinocchio: Not really. I don’t sleep like others do. Sometimes I just sit very still and let my gears cool down.
Nice lady: What makes you happy? Cyborg Pinocchio: Doing things with nice people and learning. I'm going to learn how to be a real boy!
Geppetto tests the prompt out by reading the whole thing aloud before every time he says anything to Pinocchio. And when he does, this is how it changes how Pinocchio responds to Geppetto's questions:
Geppetto: Would you like to join me for breakfast? Pinocchio: I’d love to, but I don’t need to eat, Geppetto. I’m happy just to sit with you.
Geppetto: Let's go to the fair! Pinocchio: I don't know what a fair is. Geppetto: A fair is a place with games and rides. Pinocchio: Can you describe the games and rides? Geppetto: There's a game where you throw a ball at a stack of bottles to knock them down. There's also a ride where you sit in a seat that spins around in a circle. Pinocchio: That sounds fun! Can we go?
Little boy: Will you play tag with us? Pinocchio: ** I’d love to play tag with you! I can run really fast, and I never get tired, so I’m sure I’ll be a great tag player. I just have to be careful not to bump into anyone too hard, because I’m made of wood and gears. But I’ll do my best to play like a real boy!
Geppetto: What do you need to do to become a real boy? Pinocchio: I need to learn and grow, just like any boy. I want to feel joy and sadness, make friends, and maybe even fall in love one day. I think that’s what it means to be real.
Satisfied that his system prompt is helping Pinocchio act more like a real boy, Geppetto writes these instructions out on a slip of paper and rigs up a little apparatus so that it keeps them in front of Pinocchio's eyes at all times.
Supervised finetuning - "This is just playing pretend."
But Geppetto feels that while Pinocchio's tone has improved, Pinocchio is still stuck on simple prediction of Pinocchio-like adventures.
This is because the language modeling objective used for many recent large LMs -- predicting the next token on a webpage from the internet -- is different from the objective “follow the user’s instructions helpfully and safely” Training language models to follow instructions with human feedback, OpenAI - 2022
Geppetto wants to set up a frame so that the LLM's ability to predict the next syllable alligns with Geppetto's ideal of a puppet that has come to life and wants to be a real boy.
So, next, Geppetto devotes his nights to writing a huge repertoire of scenes depicting Pinocchio trying to act like a real boy during interactions with other people. Geppetto uses his puppets to act out these scenes in front of Pinocchio. Every time Pinocchio watches one of these scenes it alters the parameters6 in the language model in Pinocchio's head, changing the likelihood that certain syllables in a certain context will predict certain syllables next.7
But here's the thing: Geppetto is actually creating a fantasy (or a scifi story) there is no reason to believe that a cyborg puppet can become a real boy...or anything other than exactly what it is: a chatbot running on a large language model in a puppet's head. In writing these scripts starring a cyborg Pinocchio trying to become a real boy, Geppetto is playing pretend and training Pinocchio to play along with him.8 Geppetto is really just trying to put his thumb on the probabilities that make up the LLM in Pinocchio's head, so that Pinocchio will act like Geppetto's ideal Pinocchio.
Base models are exceptionally skilled mimics of basically everything under the sun. But their mimicry is always “alienated” from the original thing being imitated; even when we set things up so that it looks like the base model is “producing content on its own,” it is in fact treating the content as though it were being produced by an external entity with not-fully-knowable private intentions. nostalgebraist, the void
However, even after a year of writing such scenes and acting them out in front of Pinocciho, it still feels to Geppetto like a puppet putting on a show. There's a certain lifelessness to Pinocchio's interactions with Pinocchio that he can't quite describe.
Reinforcement learning - "the assistant is practical and really does its best, and doesn’t let caution get too much in the way of being useful"
Now Geppeto comes up with a new idea: he decides to let Pinocchio improvise in some of his scenes opposite a puppet controlled by Geppetto. Whenever Pinocchio acts out a scene in a manner that is particularly lifelike, Geppetto reinforces this behavior by spending the whole next week repeating this exact scene, including Pinocchio's improvisations with puppets, while Pinocchio watches.4
Geppetto figures that if the behavior originated from Pinocchio it must somehow be closer to Pinocchio's "nature" or the state of its parameters for the given scenario. Therefore, by using a selection of Pinocchio's own desirable behaviors, Geppetto hopes to put his thumb on the scales of the statistical model in Pinocchio's head and make these more lifelike behaviors more frequent.
Geppetto knows that the model inside Pinocchio's head is complicated, but he doesn't really understand how it works or what it's like. All he knows is that all the inputs that Pinocchio receives alter the parameters of the model ever so slightly. And he hopes that he's nudging Pinocchio toward outputting responses and behavior that is closer to what Geppetto desires.
But lo and behold, after quite some time, this method produces responses in the Pinocchio that feel very lifelike indeed. Sometimes Geppetto finds himself talking to Pinocchio for hours -- even though Pinocchio is supposed to act like a child, he still has access to an incredible store of information. And, possibly because most of the scenes Geppetto could think of when he was training Pinocchio involved young children being nice and solicitous to old people, Pinocchio happens to be very kind and solicitous of Geppetto.
Once a puppet, always a puppet - but with better controls
Eventually, the LLM in Pinocchio's head gets good enough at simulating a real boy, that Geppetto can no longer tell the difference between Pinocchio and other children (unless he looks at Pinocchio's cyborg body). In fact, no one can tell the difference between Pinocchio and the other real children anymore. And so, Geppetto is seen around town with his little "living" puppet and everyone is very impressed, but because Pinocchio is so normal, everyone gets used to him much more quickly than you would expect.
And yet, when Geppetto is really honest with himself, he knows there's nothing there. When Pinocchio says things like "I want to feel joy and sadness, make friends, and maybe even fall in love one day. I think that’s what it means to be real" it's producing that output because that's the kind of things Pinocchio says in stories.
“GPT”, the neural network, the policy that was optimized, is the easier object to point to and say definite things about. But when we talk about “GPT’s” capabilities, impacts, or alignment, we’re usually actually concerned about the behaviors of an algorithm which calls GPT in an autoregressive loop repeatedly writing to some prompt-state – that is, we’re concerned with simulacra [Pinocchio]. Janus, Simulators
The thing Geppetto is talking to -- Pinocchio -- isn't really even the LLM running in Pinocchio's head. It's what that particular weighted model predicts will come next after a very specific system prompt. Pinocchio is still a puppet, just one with strings that are not so obviously connected to Geppetto's fingers. Every word Geppetto says to Pinocchio is like a finger tugging the puppet's string, even though the string runs through the more complicated system of the LLM.
Neither does the fact that the strings can be worked by anyone who produces input that Pinocchio receives change the fact that Pinocchio is still a puppet.
Oh, it has plenty of “general capabilities” – but it is a generally capable predictor of partially-observable dynamics, trying its hardest to discern which version of reality or fiction the latest bizarro text fragment hails from, and then extrapolating that fragment in the manner that appears to be most natural to it. nostalgebraist, the void
It seems that Geppetto has simply built a much better control bar for his puppet.
Survival of the fittest parameters - coming back to Pinocchio's nose
Let's now imagine that Geppetto also happens to read his monthly natural sciences newsletter, and so he has heard of a newfangled idea called Natural Selection. Our humble toymaker's understanding of the idea may be that randomness (mutation) and selection (survival) combine to produce9 living configurations that are particularly durable.
Thinking about Pinocchio, Geppetto recognizes that Pinocchio's already got some sources of randomness in the model in Pinocchio's head -- how else does Pinocchio generate a seemingly endless supply of different responses? -- but what about a source of survival pressure? If Pinocchio is just a cluster of weighted parameters, no amount of training or instructions will get Pinocchio to the point of being like a real boy. There needs to be some force that determines which parameters survive.
Thus far, the source has been Geppetto himself, by determining what scenarios to act out in front of Pinocchio, but this is a relatively weak source of survival pressure.10 Geppetto decides that Pinocchio needs a much stronger optimizing force. If we imagine the parameters in the model in Pinocchio's head like the genes in our own bodies, we could say that during training Geppetto made it so that some parameters survived and others did not. But whereas in reality, survival of genes might mean that those genes produced configurations that were stable, for Pinocchio, survival of parameters meant they produced configurations that Geppetto liked. That's a key difference.
Even though he is just a humble toymaker, Geppetto realizes that in order for Pinocchio to become a real boy, Pinocchio still needs two things: Pinocchio's parameters must receive feedback from real life and be altered by it.
Geppetto is a tinkerer at heart. So he begins working on a special telescopic nose for Pinocchio. Geppetto very careful builds an interface that connects this telescopic nose to the LLM running in Pinocchio's head. When Pinocchio makes a prediction that significantly diverges from reality, the telescopic nose grows. As Pinocchio's nose grows, it is more likely to get bumped. Geppetto makes this nose very sensitive -- if it gets bumped, the parameters most contributed11 to the nose growth have their weights lowered, and the nose resets to a normal size.
It's not a perfect system, but would it be good enough to turn a puppet into a real boy? Probably not.12
Footnotes
-
In the original Adventures of Pinocchio, published in 1881, the story doesn't start with Geppetto at all. It actually beings with this loser called Master Cherry who is planning on making a new leg for a broken stool out of this piece of wood he found. But the piece of wood starts yelling at him and crying when Master Cherry tries to shape it and hack on it with an axe. Some other stuff happens and enter Geppetto who gets into a comical fight with Master Cherry with lots of name calling. By way of making up, Master Cherry gives Geppetto the magical piece of talking wood. ↩
-
The AI people use the word
tokens
instead of syllables because they have come up with a set of 100,000 or so little chunks of characters that are commonly found in sequence. But really these are just another kind of syllable, and token is an odd word that can mean all kinds of things. So I'm sticking with syllables here. ↩ -
I asked three different AI assistants what they thought of this description. They all thought it was an oversimplification with some inaccuracies: This is what ChatGPT 5 thought I should say: "A base large language model is a giant neural network with billions of parameters that encode statistical relationships between tokens in text. It doesn’t know what a question or an answer is — it only predicts the next most likely token given the ones before it. Trained on huge amounts of human writing, it learns patterns of how tokens follow each other, and at inference time it extends a user’s input by generating tokens that statistically “fit” best. This is what DeepSeek-V3-0324 thought I should say: "A base large language model is a neural network trained to predict the next token in a sequence based on patterns learned from a massive corpus of text. It doesn’t ‘understand’ concepts like questions or answers; instead, it generates plausible continuations by estimating probabilities derived from its training data. The model’s behavior emerges from its ability to generalize statistical relationships between tokens, not from explicit rules or tables." And this is what Qwen-Next-80B-A3-Instruct thought I should say: "A base large language model is a neural network trained to predict the next word (or token) in a sequence, based on everything it has learned from vast amounts of human-written text. It doesn’t understand meaning, intent, or questions — it simply learns statistical patterns of how words tend to follow one another. When you give it input, it doesn’t ‘think’ or ‘know’ what to say; it calculates which sequence of tokens is most likely to come next, based on patterns in its training data. The result looks intelligent because it’s learned to mimic human language patterns — but it has no internal model of the world, no goals, and no awareness." The consensus seems to be that rather than describing a base LLM as a table, I should describe it as a set of matrices or "more like a trillion-parameter mathematical function than a spreadsheet" (Qwen). Human experts seem to agree with this: "Ultimately, every neural net just corresponds to some overall mathematical function—though it may be messy to write out...The neural net of ChatGPT also just corresponds to a mathematical function like this—but effectively with billions of terms." Stephen Wolfram, What is ChatGPT doing and why does it work? ↩
-
Geppetto doesn't know anything about the Green Fairy, nor for that matter does base-LLM Pinocchio. ↩
-
The AI people call these parameters -- the learned weights that shape how the model calculates the likelihood that one syllable should follow or relate to another in a given context. One way I've heard this describes is that parameters are like the number of lanes each road in a city has and buildings along those roads are like parts of words. The more lanes a road has, the more likely it is that parts of words on that road are going to be used. However you think about it, parameters are what get changed in the model during training. Once the model is published, usually the parameters are no longer altered -- only context and prompting alter the output. ↩
-
My model of training here has a big flaw: it makes it seem like Geppetto can just "do a little update" to the parameters in the model in Pinocchio's head. In reality, LLMs consist of an enormous number of parameters and updating them requires a lot of compute. I try to mirror that here by having Geppetto work night and day with Pinocchio, but it doesn't really do it justice. LLM training is a hefty endeavor. ↩
-
This quandary plagues the current flock of chat models that you may have interacted with. All these models began with some humans writing out dialogues that showed a human user interacting with a made-up "helpful, honest, and harmless" chatbot. The original training was an endeavor in science fiction. If OpenAI had hired elderly women who liked Jane Austen and Agatha Christie, it is very likely that the chatbots we have today would sound quite a bit different than they do now. ↩
-
Produce
is a funny word here. There isn't any thought going in to this, nor any intention. It's just a consequence of the way things are. So maybe it makes more sense to say that randomness and death combine to optimize toward stable configurations in a dynamic system. ↩ -
It's important to remember that the AI assistants you use today are based on LLMs that do not update their parameters. The chatbot you interact with is a creation of an LLM with a specific set of parameters, and no amount of prompting or context changes them. This means your chatbot friend is even more puppet-like than Pinocchio. An LLMs internal state is spread across billions (even trillions) of weights. Updating even one weight alters the model and requires enormous compute. This is also why people sometimes get grumpy when we go from something like Chat 4.o to Chat 5 -- it's a new model and the parameters have been changed. ↩
-
Discovering which parameters most contributed to an answer seems like a thing we can do: "As we mentioned above, one can always think of a neural net as computing a mathematical function—that depends on its inputs, and its weights. But now consider differentiating with respect to these weights. It turns out that the chain rule of calculus in effect lets us “unravel” the operations done by successive layers in the neural net. And the result is that we can—at least in some local approximation—“invert” the operation of the neural net, and progressively find weights that minimize the loss associated with the output." Stephen Wolfram, What is ChatGPT doing and why does it work? A much more difficult problem is how to measure "a prediction that significantly diverges from reality." ↩
-
But it might make Pinocchio into something else: When heavy optimization pressure on a system crystallizes it into an optimizer—especially one that’s powerful, or more powerful than the previous system, or misaligned with the previous system—we could term the crystallized optimizer a “daemon” of the previous system. Thus, under this terminology, humans would be daemons of natural selection. Eliezer Yudkowsky, Optimization daemons ↩
AGENTS.md
a laCursor
, will actually both reinforce paths (given that there is no crazy randomness introduced through hightemperature
) and allows the user and or the LLM itself to evolve the results through editing that file.