pull down to refresh

Wow! Nice share.
Anyone else notice this part of the article:
The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset.
Maybe they were just acting this way out of an abundance of caution, but I was wondering, how would o4 scan their email? Or perhaps they were worried that another model could scan email, and then... publish something online which o4 could read?
I guess it's pretty safe to assume that everything typed into a keyboard will become training data at some point.