pull down to refresh
25 sats \ 0 replies \ @Scoresby OP 3h \ parent \ on: GDPval: Measuring the performance of our models on real-world tasks - OpenAI AI
Well, a lot of this really depends on who the "expert" humans in the blond test were.
I read 50% on the chart to mean that it is a coin toss to know whether graders thoought the human or the ai did better work. Less than 50% means graders tended to rank ai as doing less good work than the humans. Greater than 50% means they tended to rank ai as doing better work than humans.
So the important factor is were the humans the ai was graded against "top 2%" kind of people.
Also, the point about ai failure being more likely tonne catastrophic is valid.
Finally, I'd say I have no doubt that openAI is pumping their own bags with a sales pitch in every piece of info they put out. But even so, there is something here.
It feels to me like when social media was bursting onto the scene. I mostly dismissed it because I didn't see the utility and I didn't trust the promoters. Yet, lately I come and lately I see that there may be some utility here. It may be an open question whether it is a net benefit, but it certainly is a powerful tool to do something. I see ai in the same light (and perhaps I'm just scared of repeating what I now see as a mistake in my attitude toward social media).