Last week I saw this paper from OpenAI called “Preventing URL-Based Data Exfiltration in Language-Model Agents”, which goes into detail on new mitigations they’ve added.
This is a great read. I like this transparency.Initial Disclosure in 2023Initial Disclosure in 2023
Tackling the ProblemTackling the Problem
What Is the New MitigationWhat Is the New Mitigation
How Can It Be Bypassed?How Can It Be Bypassed?
Additional Mitigation IdeasAdditional Mitigation Ideas
Final Risk - Thorough Adoption of the MitigationFinal Risk - Thorough Adoption of the Mitigation
...read more at embracethered.com
pull down to refresh
related posts