Twitter and Reddit have a ton of interesting niche based data for training AI models. It also helps that retweets, likes, and upvotes can parse out the noise. There are way more interesting conversations and ideas going on in these platforms vs Facebook or instagram posts.
I think another interesting one would be Medium or other blog like platforms. Being able to train AI via triangulating thought leaders long form articles and comments offers unique insights beyond Google or Wikipedia.
On that note podcast transcripts are probably the holy grail for really getting into the weeds. Not sure how licensing works but since many podcasts are RSS feeds I wonder if it’s an open market to download and train on transcripts (similar to YouTube but I find podcasts to have more signal than videos most of the time).
Books too would be very interesting