pull down to refresh

I saw a paper that said the models are cheating and learning the exact test questions because if you add extraneous information to a question it previously answered correctly it gets confused with the extraneous information and answers wrong
That is what I was thinking the other day when seeing the performance against benchmarks. Model trainers are pulling a VW on the bench?
reply