> you'd think it'd be easy enough to build in a mode where it can switch to math mode and just use an actual numerical software package

Using sub-agents like that was what we did before GPT. Think of Google Assistant or Alexa or Siri when you ask them about the weather.

But that isn't OpenAIs, Anthropics, Mistrals etc goals. We're not trying to build alist of agents anymore. We're trying to build actual intelligence from scratch. AGI. Accelerate. No crutches, it should become actually intelligent.

I'm a bit confused why it would struggle with this.

I mean, I understand why it would based on how the language models work.... but you'd think it'd be easy enough to build in a mode where it can switch to math mode and just use an actual numerical software package?


SimpleStacker

The goal isn't for you to change the question such that chatGPT solves it correctly. The goal is for LLMs to actually become smarter. It should become more intelligent.

An example would be that it should do math correctly even deeper inside of longer form text. Or even when the user didn't know his question would involve math.

I don’t understand what’s so hard about asking chatgpt to write a script to do this with accuracy. People are hung up on the “how many R’s are in strawberry?” Any math problem you’re doing, you have to tell the AI to use a script, otherwise it will try to do from memory of webpages that have math problems.

https://chatgpt.com/share/66eb00c7-d048-8010-890e-1607ae6e7d67

clarity

https://xcancel.com/yuntiandeng/status/1836114401213989366

nitter

Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4

tech

![](https://pbs.twimg.com/media/GXtXsQ-akAc1egt?format=jpg&name=large)

> Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4

https://x.com/yuntiandeng/status/1836114401213989366