pull down to refresh

Apparently, even the "open source" ones are not completely open. Where is the full dataset used to train them available for download, modification and inspection? If they control the language, they still control the world.

Here's the dataset for gpt-j: https://pile.eleuther.ai/
Most other free models also have pubicly available datasets and are fully reproducible.

reply

Error 404 when I try to download the Pile.

But I want to believe.

reply

lol here's the torrent
magnet:?xt=urn:btih:0d366035664fdf51cfbe9f733953ba325776e667&dn=EleutherAI_ThePile_v1&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

reply