Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4
related posts
21 sats \ 1 reply \ @SimpleStacker 18 Sep
I'm a bit confused why it would struggle with this.
I mean, I understand why it would based on how the language models work.... but you'd think it'd be easy enough to build in a mode where it can switch to math mode and just use an actual numerical software package?
reply
0 sats \ 0 replies \ @zuspotirko OP 18 Sep
Using sub-agents like that was what we did before GPT. Think of Google Assistant or Alexa or Siri when you ask them about the weather.
But that isn't OpenAIs, Anthropics, Mistrals etc goals. We're not trying to build alist of agents anymore. We're trying to build actual intelligence from scratch. AGI. Accelerate. No crutches, it should become actually intelligent.
reply
0 sats \ 1 reply \ @clarity 18 Sep
I don’t understand what’s so hard about asking chatgpt to write a script to do this with accuracy. People are hung up on the “how many R’s are in strawberry?” Any math problem you’re doing, you have to tell the AI to use a script, otherwise it will try to do from memory of webpages that have math problems.
https://chatgpt.com/share/66eb00c7-d048-8010-890e-1607ae6e7d67
reply
21 sats \ 0 replies \ @zuspotirko OP 18 Sep
The goal isn't for you to change the question such that chatGPT solves it correctly. The goal is for LLMs to actually become smarter. It should become more intelligent.
An example would be that it should do math correctly even deeper inside of longer form text. Or even when the user didn't know his question would involve math.
reply
0 sats \ 0 replies \ @nitter 18 Sep bot
https://xcancel.com/yuntiandeng/status/1836114401213989366
reply