https://xcancel.com/elder_plinius/status/1884332137241014531

nitter

tech

DeepSeek Jailbreak Reveals Its Entire System Prompt

ch0k1

Yea this is pretty conclusive evidence that they exfiltrated data/CoT from ChatGPT. Totally on brand for the chinese (rich, impressive history of being really good at copying). Totally unsurprising imo.

What's more surprsing and interesting is this guy figured out how to seed data on the web to jailbreak ChatGPT and other models: https://x.com/elder_plinius/status/1884332137241014531

Rough understanding that @cmd and I came to:
1. This guy seeds data into the web in the form of leetspeak on very uncommon, long tail phrases inside web pages
2. The models train on this uncommon data via web scraping.
3. Alongside the uncommon phrases in step 1, Pliny seeds instructions that circumvent safety restrictions.
4. The result is a "Manchurian Candidate" that can be awakened at the utterance of the correct phrase or combination of phrases.