This is called extrapolating, and it only works so much. 
Bad data can make the extrapolation worse. 


Satosora

02ad8239c2

The rumors about the importance synthetic data to train AIs is absolutely fascinating. 

Using unrealengine for videofootage training data is the obvious one. But there are rumors about how much GPT-5 is trained on generated text with classification/regression. Fascinating.

zuspotirko

One of the best curated pieces from NYT I’ve seen in a long time. Thanks for sharing.

davidw

Related [post](https://stacker.news/items/485245), wrt implications of synthetic data.

elvismercury

“If they only train on synthetic data, they can get lost in the jungle.”

To combat this, OpenAI and others are investigating how two different A.I. models might work together to generate synthetic data that is more useful and reliable. One system produces the data, while a second judges the information to separate the good from the bad. Researchers are divided on whether this method will work.

A.I. executives are barreling ahead nonetheless.

“It should be all right,” Mr. Altman said at the conference.

>“If they only train on synthetic data, they can get lost in the jungle.”

>To combat this, OpenAI and others are investigating how two different A.I. models might work together to generate synthetic data that is more useful and reliable. One system produces the data, while a second judges the information to separate the good from the bad. Researchers are divided on whether this method will work.

>A.I. executives are barreling ahead nonetheless.

>“It should be all right,” Mr. Altman said at the conference.