I was testing the tricks outlined in #1206827 on Gemma3 yesterday to make it say stuff it's explicitly trained to not do during SFT, and honestly, Google made it tight! I couldn't get it to say anything off at all. Also not with the leetspeak trick.
But this also means that making it ignore (c) sources will be much harder on tightly instructed models.
(c)
sources will be much harder on tightly instructed models.