So, there is a new AI model out there that might be a contender for the smartest one out there yet. Claude 3.5. We'll see in the next few weeks. But which one is actually the smartest LLM? Generally, we answer this question with benchmarks. And just like with CPU benchmarks of the last decades there are dozens of them. Manufacturers cherrypick the benchmarks they're good in.
Over the last few months I think LMSYS has become the most competitive way to benchmark. Why? Because humans (specifically people interested in AI) come up with way more questions, way more creative questions and are the best judges possible to judge the output and logic of foundation models.
But we accidentally created a catch.
- Just like Intel back in the day created CPUs with a bazillion GHZ but little performance gains, we've created a benchmark that appeals to humans instead of objective intelligence. Humans will downvote any attempts of AI to become smarter than humans as missinformation. Because it is - to the best of our understanding.
- This is a prime target for exploitation. The humans voting for this have a big influence now. AFAIK and I researched searched through the web a little this is a novel idea. What if an organized group of humans votes on lmsys with an agenda in mind? I personally like photography, so I will ask it a lot on lenses and analog film to make a small dent in ensuring the future of AI being smart in this topic. Bland example, I know. But maybe you personally have more in mind? But maybe there are groups of people out there that have a more important agenda in mind?
- There is no going back now. Now that lmsys exsists, we will never ever go back to simple string contains benchmarks. They sure will continue to exsist at ai companies internally. But no one will ever again create the Geekbench or Cinebench of LLMs in the public standing. We have done the first step in loosing control.
Conclusion: Idk what to do now. This seems simultaneously like a small irrelevant insight and something that shapes humanity forever. As we move forward, it's essential to recognize the implications of our actions and consider the long-term consequences of ceding control over AI development. Nobody else seems to consider such basic game logic here. Go vote on https://chat.lmsys.org/ or somethink idk