Although this is an interesting paper on its own, it highlights something clearly that many people I discuss LLMs with don't seem to be on terms with: LLMs are training and judging LLMs, particularly GPT is judging others:
This shows how researchers are not coming up with super great datasets by themselves, they just feed wiki pages into GPT and then use a series of refinement and filtering to maybe get a great set. Or maybe not. No one knows. Who needs precision if you don't do anything all day while the LLM does everything?
This is why AI moves so fast: every step of the way is done by or at the very least aided by AI. But this is also why some things are very hard to get rid of, like biases and stop-phrases like
yOu'Re AbSoLuTeLy RiGhT!
, when the user is actually absolutely wrong.
garbage in, garbage out
. So if you want excellence, somehow humanity will have to put in the work, and it's not going to be cheap.