0xbitcoiner

 on Gemma3 yesterday to make it say stuff it's explicitly trained to not do during SFT, and honestly, Google made it tight! I couldn't get it to say anything off at all. Also not with the leetspeak trick.

But this also means that making it ignore 

 sources will be much harder on tightly instructed models.

Boffins craft certified way for AI to unlearn private data

I was testing the tricks outlined in https://stacker.news/items/1206827/r/optimism on Gemma3 yesterday to make it say stuff it's explicitly trained to not do during SFT, and honestly, Google made it tight! I couldn't get it to say anything off at all. Also not with the leetspeak trick.

But this also means that making it ignore `(c)` sources will be much harder on tightly instructed models.