It seems like they are using existing statistical techniques to filter the data to the ones that will be most impactful for training and pull good examples from the groups... Very cool system
Looks awesome if we realize that Google's results were with a 3.25B model, but the evaluation data provided in the paper was "a mockup", so we don't know if this is apples-to-apples. Nevertheless, I'm a big fan of "less junk in".
k
charts