pull down to refresh
100 sats \ 1 reply \ @cascdr 3 Feb \ on: DeepSeek Jailbreak Reveals Its Entire System Prompt tech
Yea this is pretty conclusive evidence that they exfiltrated data/CoT from ChatGPT. Totally on brand for the chinese (rich, impressive history of being really good at copying). Totally unsurprising imo.
What's more surprsing and interesting is this guy figured out how to seed data on the web to jailbreak ChatGPT and other models: https://x.com/elder_plinius/status/1884332137241014531
Rough understanding that @cmd and I came to:
- This guy seeds data into the web in the form of leetspeak on very uncommon, long tail phrases inside web pages
- The models train on this uncommon data via web scraping.
- Alongside the uncommon phrases in step 1, Pliny seeds instructions that circumvent safety restrictions.
- The result is a "Manchurian Candidate" that can be awakened at the utterance of the correct phrase or combination of phrases.
reply