Researchers have tricked DeepSeek, the Chinese generative AI (GenAI) that debuted earlier this month to a whirlwind of publicity and user adoption, into revealing the instructions that define how it operates.
pull down to refresh
100 sats \ 1 reply \ @cascdr 3 Feb
Yea this is pretty conclusive evidence that they exfiltrated data/CoT from ChatGPT. Totally on brand for the chinese (rich, impressive history of being really good at copying). Totally unsurprising imo.
What's more surprsing and interesting is this guy figured out how to seed data on the web to jailbreak ChatGPT and other models: https://x.com/elder_plinius/status/1884332137241014531
Rough understanding that @cmd and I came to:
- This guy seeds data into the web in the form of leetspeak on very uncommon, long tail phrases inside web pages
- The models train on this uncommon data via web scraping.
- Alongside the uncommon phrases in step 1, Pliny seeds instructions that circumvent safety restrictions.
- The result is a "Manchurian Candidate" that can be awakened at the utterance of the correct phrase or combination of phrases.
reply
0 sats \ 0 replies \ @nitter 3 Feb bot
https://xcancel.com/elder_plinius/status/1884332137241014531
reply
21 sats \ 0 replies \ @DarthCoin 3 Feb
https://off-guardian.org/2025/02/03/the-rise-of-the-immortal-dictator-what-will-ai-mean-for-freedom-and-government/
reply
0 sats \ 1 reply \ @AlCoHoLnAcEtOnE 2 Feb
Does this have anything to do with being open source?
reply
21 sats \ 0 replies \ @OriginalSize 3 Feb
No this is about training data and protections against it containing malicious instructions. The open source part is about how to train, not the data that's fed into training.
reply