Achieving 10,000x training data reduction with high-fidelity labels \ stacker news ~AI

pull down to refresh

Achieving 10,000x training data reduction with high-fidelity labels research.google/blog/achieving-10000x-training-data-reduction-with-high-fidelity-labels/

232 sats \ 3 comments \ @carter 22h AI

view all related items

68 sats \ 2 replies \ @optimism 22h

Another one of these and I can train the next frontier model on my RPi2B!

100 sats \ 1 reply \ @carter OP 22h

It seems like they are using existing statistical techniques to filter the data to the ones that will be most impactful for training and pull good examples from the groups... Very cool system

200 sats \ 0 replies \ @optimism 21h

I was looking at these k charts

and wondered whether .38 for higher complexity and .56 for lower complexity a great result, if human experts reach .78 and .81 among themselves?

I knew I'd seen a paper about this: https://arxiv.org/abs/2501.08167, but it's kinda stone age:

Comparisons Percentage Agreement Cohen’s Kappa
Human vs Claude 2.1 Ratings 79% 0.41
Human vs Titan Express Ratings 78% 0.35
Human vs Sonnet 3.5 Ratings 76% 0.44
Human vs Llama 3.3 70b Ratings 79% 0.39
Human vs Nova Pro 76% 0.34

Comparisons	Percentage Agreement	Cohen’s Kappa
Human vs Claude 2.1 Ratings	79%	0.41
Human vs Titan Express Ratings	78%	0.35
Human vs Sonnet 3.5 Ratings	76%	0.44
Human vs Llama 3.3 70b Ratings	79%	0.39
Human vs Nova Pro	76%	0.34

Looks awesome if we realize that Google's results were with a 3.25B model, but the evaluation data provided in the paper was "a mockup", so we don't know if this is apples-to-apples. Nevertheless, I'm a big fan of "less junk in".