Adversarial Confusion Attacks: Disrupting Multimodal LLMs - Jakub Hoscilowicz \ stacker news ~AI

We introduce adversarial confusion attacks as a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the objective is to induce systematic malfunction: incoherent or confidently wrong text. The long-term goal is to craft adversarial confusion images that can be embedded into websites (e.g., as background elements), thereby preventing MLLM-powered AI Agents from operating on those sites. In this paper we present preliminary results limited to the direct MLLM setting, where adversar-ial images are submitted as model inputs. Our method maximizes next-token entropy under expectation over transformations (EoT) and a small ensemble of open-source MLLMs. The attack transfers to some proprietary models (GPT-5-high, GPT-o3 and Amazon Nova Pro), which produce structurally coherent hallucinations. This is a work in progress; code will be released with the final version.

Like Bitcoin, an adversarial environment is good for LLM AIs even if a little painful.

They promise the release of code in the future.