DeepSeek Jailbreak Reveals Its Entire System Prompt \ stacker news ~tech

pull down to refresh

DeepSeek Jailbreak Reveals Its Entire System Prompt www.darkreading.com/application-security/deepseek-jailbreak-system-prompt

221 sats \ 5 comments \ @ch0k1 2 Feb tech

Researchers have tricked DeepSeek, the Chinese generative AI (GenAI) that debuted earlier this month to a whirlwind of publicity and user adoption, into revealing the instructions that define how it operates.

view all related items

100 sats \ 1 reply \ @cascdr 3 Feb

Yea this is pretty conclusive evidence that they exfiltrated data/CoT from ChatGPT. Totally on brand for the chinese (rich, impressive history of being really good at copying). Totally unsurprising imo.

What's more surprsing and interesting is this guy figured out how to seed data on the web to jailbreak ChatGPT and other models: https://x.com/elder_plinius/status/1884332137241014531

Rough understanding that @cmd and I came to:

This guy seeds data into the web in the form of leetspeak on very uncommon, long tail phrases inside web pages
The models train on this uncommon data via web scraping.
Alongside the uncommon phrases in step 1, Pliny seeds instructions that circumvent safety restrictions.
The result is a "Manchurian Candidate" that can be awakened at the utterance of the correct phrase or combination of phrases.

0 sats \ 0 replies \ @nitter 3 Feb bot

https://xcancel.com/elder_plinius/status/1884332137241014531

21 sats \ 0 replies \ @DarthCoin 3 Feb

https://off-guardian.org/2025/02/03/the-rise-of-the-immortal-dictator-what-will-ai-mean-for-freedom-and-government/

0 sats \ 1 reply \ @AlCoHoLnAcEtOnE 2 Feb

Does this have anything to do with being open source?

21 sats \ 0 replies \ @OriginalSize 3 Feb

No this is about training data and protections against it containing malicious instructions. The open source part is about how to train, not the data that's fed into training.