pull down to refresh

If you're curious about how they do these kinds of tests, there are crowdsourced ones for video and image generation available, see for example this video "arena". It's basically A/B testing, and it's subjective. (Yes, also with LLM-as-judge, then it's simulated subjectivity.)
Greater than 50% means they tended to rank ai as doing better work than humans.
I'd challenge that this is better looking work and not better work in an absolute sense. Since an LLM can generate faster than a human, it will give a more complete output, always, but it is much more error prone.
If you want profound results with LLMs, which is absolutely possible, then you give a human expert access to an LLM. This is because a human expert knows how to approach a problem and ask the right follow-up questions. LLMs are very much garbage in, garbage out systems and most have ingested tons of garbage, the prompt is the filter (with a bit of luck) and this is why "prompt engineering" is a thing, be it a dumb thing.
Now the question is can we train LLMs to an acceptable level of proficiency in directing LLMs (train it in management, QA, and so on) and I think that the answer is yes. I even think that this is a worthy goal and I'd like to have this. Preferably to work on one of those NVMe tensor cards and then we just build an RPi cluster of workers: build the hive. I think it'd be awesome, but I think that concepts like "AI as persona", "AI is smarter than humans", "AGI", "buy muh subscription", "we be serving you ads" are distractions from creating better tooling.
It feels to me like when social media was bursting onto the scene. I mostly dismissed it because I didn't see the utility and I didn't trust the promoters. Yet, lately I come and lately I see that there may be some utility here. It may be an open question whether it is a net benefit, but it certainly is a powerful tool to do something.
Can you elaborate on this, specifically:
  • What social media platform?
  • Can you define what you mean by utility?
Can you elaborate on this, specifically
Platform Let's start when Facebook and then Twitter came out. My impression at the time was that these were ego-stroking tools for people who had nothing better to do. This isn't because I didn't like tech or the internet -- blogging was one of my first loves (probably even before girls). But I dismissed social media for a long time because I generally don't like self-aggrandizing, and I assumed that it is what the kind of people who used social media used it for.
My memory of when these tings got started (especially headliners like Facebook and Twitter) is that there was more emphasis on social than on media. I regret that I didn't spend time learning how to use them in these early stages. Or that I had asked myself how the world would look if social media stuck around and even became integral to daily life, rather than being reluctantly dragged into usage before realising : Ah! there is something here.
Utility In the last decade or so, social media tools (forums like this one, Reddit (less so), things like X or LinkedIN, Telegram -- never yet found a use for Facebook) have been hugely useful to me. First, as tools for learning. Second, helping me to build relationships. Third, I've gotten all my jobs and gigs through social media, as has my wife. Even though there is a huge amount of fluff, the connective power of social media and its massive availability of information and door-opening access to people is a wonder.

Your points about better looking work and the distraction of AI personas or AGI faff are good. And, if anything, that's the tone that society in its current understanding of AI as a scifi character rather than a tool certainly needs.
Comparing AI tools to humans to see who is "better" doesn't seem useful, but comparing them us to figure out what AI might be good at does.
Finally: "buy muh subscription", "we be serving you ads" -- this is where my pessimism comes in. I don't see how we end up anywhere else. General populations, even businesses, have demonstrated they mostly do not want to run their own infrastructure. The only exception to this is routers (people seem willing to plug in a device, but are very unlikely to change any settings or do something like flash their own firmware on to it). If email can be taken as a model: there is no world where most people run their own models nor where they even care to. This is true for Bitcoin as well...unless we can find an incentive that motivates more than a few crazy individuals (perhaps if the censorship state comes quick and hard, it would generate enough of a backlash to create a culture of people who care about personal sovereignty in their devices) to desire control over the tools they use, we'll probably end up with captured AI like captured email. I suppose the captured AI future looks even uglier than the captured email future and doesn't leave us in a very nice state.
reply
I was on FB and LinkedIn rather early, before there were a million users on both, because my friend told me it was cool in the former case and one of my colleagues was moving to there in the latter. For both it did feel novel and cool at the time and it was definitely more social than networking on FB, but more networking than social on LinkedIn. I am no longer on either. Twitter became useful for me in 2010 or so I think, I've had great DM conversations on there and definitely some social discovery through just reading what people are up to, but the algo screwed things up for me. Left that too.
I do get the gig part - I scored 3 jobs and found a co-founder through Reddit over a decade ago, before it was shitty. But honestly, I've gotten way more gigs from having drinks at conference afterparties, maybe 5x, even though I've spent much more time on Reddit alone than at conferences including the boring part where you're not drinking. So, I'm rather skeptical about the efficiency of social media and whether it lives up to the promises made. Too much noise, not enough signal.

General populations, even businesses, have demonstrated they mostly do not want to run their own infrastructure. [..]
Mind you I'm not advocating against services. I'm advocating against closed-source SaaS. There are plenty of 3rd party providers of open-source software. Even VPS': you don't need to run your own Xen hypervisor or k8s (or both.) There are plenty of providers for this.
[..] perhaps if the censorship state comes quick and hard, it would generate enough of a backlash to create a culture of people who care about personal sovereignty in their devices
This has already started in Europe, and I thought it was just small business but I've been amazed to learn of some major businesses working on completely de-SaaS-ing, mostly away from US dependencies. My main worry is reserved for those in emerging or dependent economies as they don't have so many options, and a dependency on a EU corporation is as bad as a dependency on a US one. I still think we can make it work but it's going to be a tough couple of years ahead.
reply