pull down to refresh

Large Language Models (LLMs) achieve strong performance on diverse tasks but often exhibit cognitive inertia, struggling to follow instructions that conflict with the standardized patterns learned during supervised fine-tuning (SFT). To evaluate this limitation, we propose Inverse IFEval, a benchmark that measures models’ Counter-intuitive Ability - their capacity to override training-induced biases and comply with adversarial instructions.

This will help with determining the success of forgetting copyrighted works too, per #1208161