They even opensourced which pretraining dataset they used
reply
Impressive
reply