“If they only train on synthetic data, they can get lost in the jungle.”
To combat this, OpenAI and others are investigating how two different A.I. models might work together to generate synthetic data that is more useful and reliable. One system produces the data, while a second judges the information to separate the good from the bad. Researchers are divided on whether this method will work.
A.I. executives are barreling ahead nonetheless.
“It should be all right,” Mr. Altman said at the conference.
One of the best curated pieces from NYT I’ve seen in a long time. Thanks for sharing.
reply
🤡🤡🤡
reply
the 🤡's leading the 🤡's
reply
This is called extrapolating, and it only works so much. Bad data can make the extrapolation worse.
reply
so AI will eat itself ?
reply
The rumors about the importance synthetic data to train AIs is absolutely fascinating.
Using unrealengine for videofootage training data is the obvious one. But there are rumors about how much GPT-5 is trained on generated text with classification/regression. Fascinating.
reply
Related post, wrt implications of synthetic data.
reply