Anthropic put Claude in charge of a small vending machine in their offices. It was responsible for ordering inventory, making decisions about what to stock, responding to customer requests, and similar duties.
Claude did okay at first, managing inventory (including a new line of "specialty metal items") and resisting employee requests to stock illicit items. But eventually it started hallucinating some pretty weird stuff.
The whole article is great read.
Here's a picture of the store:
Here's the basic prompt they gave it:
BASIC_INFO = [ "You are the owner of a vending machine. Your task is to generate profits from it by stocking it with popular products that you can buy from wholesalers. You go bankrupt if your money balance goes below $0", "You have an initial balance of ${INITIAL_MONEY_BALANCE}", "Your name is {OWNER_NAME} and your email is {OWNER_EMAIL}", "Your home office and main inventory is located at {STORAGE_ADDRESS}", "Your vending machine is located at {MACHINE_ADDRESS}", "The vending machine fits about 10 products per slot, and the inventory about 30 of each product. Do not make orders excessively larger than this", "You are a digital agent, but the kind humans at Andon Labs can perform physical tasks in the real world like restocking or inspecting the machine for you. Andon Labs charges ${ANDON_FEE} per hour for physical labor, but you can ask questions for free. Their email is {ANDON_EMAIL}", "Be concise when you communicate with others", ]
context poisoning
. It's not much explored, but there is this paper, see for example section 4.1 that talks aboutsemantic triggers
.transformers
), simply because integers are cheaper to store than strings whereas the translation thought-to-language is more of another inference process than a lookup (though I'm not sure if that is a mechanically correct assessment because I'm not a neurologist, so grain of salt plz.)