As human generated data is used up, companies resort to "Synthetic data" \ stacker news ~AI

As human generated data is used up, companies resort to "Synthetic data"www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?ugrp=u&unlocked_article_code=1.i00.2gRB.J2qpf6PYyX6b&smid=url-share

2618 sats \ 7 comments \ @chaoticalHeavy 8 Apr AI

“If they only train on synthetic data, they can get lost in the jungle.”

To combat this, OpenAI and others are investigating how two different A.I. models might work together to generate synthetic data that is more useful and reliable. One system produces the data, while a second judges the information to separate the good from the bad. Researchers are divided on whether this method will work.

A.I. executives are barreling ahead nonetheless.

“It should be all right,” Mr. Altman said at the conference.

view all related items

45 sats \ 0 replies \ @davidw 8 Apr

One of the best curated pieces from NYT I’ve seen in a long time. Thanks for sharing.

147 sats \ 1 reply \ @DesertDave 8 Apr

🤡🤡🤡

41 sats \ 0 replies \ @OneOneSeven 8 Apr

the 🤡's leading the 🤡's

30 sats \ 0 replies \ @Satosora 9 Apr

This is called extrapolating, and it only works so much. Bad data can make the extrapolation worse.

30 sats \ 0 replies \ @02ad8239c2 8 Apr

so AI will eat itself ?

30 sats \ 0 replies \ @zuspotirko 8 Apr

The rumors about the importance synthetic data to train AIs is absolutely fascinating.

Using unrealengine for videofootage training data is the obvious one. But there are rumors about how much GPT-5 is trained on generated text with classification/regression. Fascinating.

30 sats \ 0 replies \ @elvismercury 8 Apr

Related post, wrt implications of synthetic data.