Building AdvancedIF: Evolving Instruction Following Beyond “Avoid the Letter C” \ stacker news

pull down to refresh

Building AdvancedIF: Evolving Instruction Following Beyond “Avoid the Letter C”surgehq.ai/blog/advancedif-and-the-evolution-of-instruction-following-benchmarks

187 sats \ 3 comments \ @tonyaldon 14 Dec 2025 AI

view all related items

11 sats \ 2 replies \ @optimism 14 Dec 2025

I really like that they didn't do the lazy "GPT gief bench" but actually went through the human labor to create a great set.

paper
dataset

101 sats \ 1 reply \ @tonyaldon OP 14 Dec 2025

Maybe, this why it works

51 sats \ 0 replies \ @optimism 14 Dec 2025

I'm not 100% convinced, because it doesn't show gpt-5's regression in IF, which was not just coming from the bench, but was also observed in usage. But maybe that is because it still is just one method - rigorous testing means applying multiple frameworks, for example what is done for device safety.