pull down to refresh

heck yeah. pretty exciting.
By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group’s progress. “I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,” he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler “toy” version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. “It was starting to get really cheeky,” says Ono, who is also a freelance mathematical consultant for Epoch AI. “And at the end, it says, ‘No citation necessary because the mystery number was computed by me!’”
While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini’s results might be trusted too much. “There’s proof by induction, proof by contradiction, and then proof by intimidation,” He says. “If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”
Wow! Nice share.
Anyone else notice this part of the article:
The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset.
Maybe they were just acting this way out of an abundance of caution, but I was wondering, how would o4 scan their email? Or perhaps they were worried that another model could scan email, and then... publish something online which o4 could read?
I guess it's pretty safe to assume that everything typed into a keyboard will become training data at some point.
reply
100 sats \ 0 replies \ @jimmysong 5h
It's so hard to get Grok to help simplify even reasonable math equations. I've gotten pretty frustrated at trying to get AI to help, so something better would very much be welcome. I remain a bit skeptical, but something like this would be really fun to play with and use.
reply
reply
Sounds like Flowers for Charlie
reply
That's a nice one
reply