pull down to refresh
Alby... Next time please make the announcement on SN and post the SN link on twatter...
That is the RIGHT way to do it.
Had 2 steaks while in Germany. One was steak frites, the other this ribeye, however the ribeye was Australian wagyu so not a totally fair evaluationm
Overall 6/10, nothing special but I enjoyed the steak more than I have in other European countries.

I'm working on the longer post I promised @BlokchainB on this. Will probably sleep on it though, because I want it to at least be watertight and needleproof against my own scrutiny before y'all tear it down lol
"Crossing 50% does not mean you are better than a human even at the included tasks, since the AI models will have a higher rate of correlated, stupid or catastrophic failure."
This is why I feel that this is all a sales pitch. Also, I don't hire in the bottom 50%. I hire in the top 2%. Get 100 resumes burn 98, invite 2, hire 1.
The other thing is that I'd be Gell-Mann-amnesia-style betraying my own conscience by believing this, as just this morning I got code that didn't work when I tried something. And it wasn't even that hard to do it right. So expert level? No. Only if you are a lil yolo bitch with a big mouth on twitter that calls themselves an expert. In that case, you shall lose your internet credits. Preferably yesterday.
It's important to remember that the AI assistants you use today are based on LLMs that do not update their parameters. The chatbot you interact with is a creation of an LLM with a specific set of parameters, and no amount of prompting or context changes them.
I agree that prompting doesn't change the underlying model, but let's not forget that it does change the runtime results, and a living prompt, such as a continuously injected
AGENTS.md
a la Cursor
, will actually both reinforce paths (given that there is no crazy randomness introduced through high temperature
) and allows the user and or the LLM itself to evolve the results through editing that file.Of course there is still a chance that it goes wrong; it happened to me once out of 500 or so runs with Claude this weekend, where a wrong tool call was made.
Pinocchio's parameters must receive feedback from real life and be altered by it.
So do they? I'm still unconvinced. If one were to stop training an LLM to be a know-it-all, but instead train it to use tools and incorporate the feedback from the context - which is, it feels to me, much of what Anthropic figured out with Claude - do we really need to retrain the models? #1136016 also claims it isn't needed per-se.
My comment on Twitter:
Not good.
Anyway, the only ethical option for Signal if this passes is to refuse to comply. Signal shouldn't even block the EU: let the EU block them.
It should be the only legal option too: implementing Chat Control is a crime against humanity. The US should explicitly criminalize compliance.
One crazy thing about Chat Control is that Signal is heavily used in Europe for military communications. They're proposing to backdoor the exact same system that they rely on for national security.
The EU bureaucrats pushing this are just psychopaths who want more power.
This is the key sentence that kinda makes this evaluation not super useful:
Additionally, in the real world, tasks aren’t always clearly defined with a prompt and reference files; for example, a lawyer might have to navigate ambiguity and talk to their client before deciding that creating a legal brief is the right approach to help them. We plan to expand GDPval to include more occupations, industries, and task types, with increased interactivity, and more tasks involving navigating ambiguity, with the long-term goal of better measuring progress on diverse knowledge work.
Part of the human expert's work is to define the problem and collect the relevant information needed. The AI didn't have to do any of that.
Moreover, the article didn't talk about whether the AI's work product was actually put into a productionized environment. For example, were the real estate listings actually posted automatically onto Redfin/Zillow? Another part of the human's work is to navigate the many different tools and platforms and conform inputs and outputs to the expected format, and interoperate between many technologies. Not sure if the AI can do that autonomously yet.
Can you elaborate on this, specifically
Platform
Let's start when Facebook and then Twitter came out. My impression at the time was that these were ego-stroking tools for people who had nothing better to do. This isn't because I didn't like tech or the internet -- blogging was one of my first loves (probably even before girls). But I dismissed social media for a long time because I generally don't like self-aggrandizing, and I assumed that it is what the kind of people who used social media used it for.
My memory of when these tings got started (especially headliners like Facebook and Twitter) is that there was more emphasis on social than on media. I regret that I didn't spend time learning how to use them in these early stages. Or that I had asked myself how the world would look if social media stuck around and even became integral to daily life, rather than being reluctantly dragged into usage before realising : Ah! there is something here.
Utility
In the last decade or so, social media tools (forums like this one, Reddit (less so), things like X or LinkedIN, Telegram -- never yet found a use for Facebook) have been hugely useful to me. First, as tools for learning. Second, helping me to build relationships. Third, I've gotten all my jobs and gigs through social media, as has my wife. Even though there is a huge amount of fluff, the connective power of social media and its massive availability of information and door-opening access to people is a wonder.
Your points about
better looking work
and the distraction of AI personas or AGI faff are good. And, if anything, that's the tone that society in its current understanding of AI as a scifi character rather than a tool certainly needs.Comparing AI tools to humans to see who is "better" doesn't seem useful, but comparing them us to figure out what AI might be good at does.
Finally: "buy muh subscription", "we be serving you ads" -- this is where my pessimism comes in. I don't see how we end up anywhere else. General populations, even businesses, have demonstrated they mostly do not want to run their own infrastructure. The only exception to this is routers (people seem willing to plug in a device, but are very unlikely to change any settings or do something like flash their own firmware on to it). If email can be taken as a model: there is no world where most people run their own models nor where they even care to. This is true for Bitcoin as well...unless we can find an incentive that motivates more than a few crazy individuals (perhaps if the censorship state comes quick and hard, it would generate enough of a backlash to create a culture of people who care about personal sovereignty in their devices) to desire control over the tools they use, we'll probably end up with captured AI like captured email. I suppose the captured AI future looks even uglier than the captured email future and doesn't leave us in a very nice state.
-
For my research, I recently had to take long PDFs that contained multiple documents smushed together into one PDF file (mostly letters and reports), and find the document boundaries. All the AI tools I tried did a pretty bad job at that, but it's something a human could have done easily.
-
Check out AI's attempts to draw ascii art: #1031420
Aye, it's really the only option (maybe others like BTCD or Libbitcoin since they're not in positions to be nudging anything)
I have to dismiss Knots v Core takes like the article because they're inherently ignorant of the fact that, just like the gold example, Knots and Core are the same (wrong) side... Which is not monetary maximalism, but application stack maximalism.
The only way to prevent that from happening is so many economic nodes rejecting application blocks that miners start to feel the burn of incentives. That means rejecting "enhancement" of something that already works.
Bitcoin is by design not a technical system, but an incentive one.
Everytime I see a "bombshell" that uncovers "spying" I'm always extremely underwhelmed.
Every person is being spied on 100% of the time by an entire nexus of IA and Corp partners. These limited hangouts that expose some minor isolated case seem like its just gaslighting the public into thinking its not happening all the time.
To the many bitcoin engineers working hard every day to build a future they believe in, how will you decentralize it?
This is an interesting way to end the article.
The majority of every-day people will never care about self-custody. They don't want or need it, and will never jump through the hoops.
In fact it is dangerous to sell people on self-custody when they don't value it themselves, because it often ends in a horror story of someone losing all their funds.
IMO, the "store of value" sales pitch is a massive LARP. Bitcoin is designed and built to be a payment protocol. It stores value for the purpose of payments, not your retirement account. There's zero protection against being hacked, phished, or losing your information.
Bitcoin is valuable because the code works as advertised and the tools are useful to those that need them. So the "builder's responsibility" really should be towards improving mining and payments, not self-custody.
As long as mining remains profitable and decentralized so that any transaction can find its way into a block in a reasonable amount of time, Bitcoin will be fine. Everything else is fiat games and kabuki theater.
I have bigger issues: how the heck will build a wall with these big boulders... I had to use some ancient techniques. Need to find another way.
a great way to bring up golden oldies is to use the search and link references in the posts and comments to contextualize yours. From SN I'd expect the search to be prominent and functional on as many aspects as possible, from relevance to date filters, etc. Not saying that isn't already, simply I don't feel finding what I look for sometimes. However, it is a great and underestimated tool.
Related items also do a great job, but most of the time the items listed are unrelated with the main topic. For example, on this post I get:
- #281689 What did Satoshi move on to after Bitcoin?
- #821035 cure53 audits noble-ciphers and noble-curves
- #599176 Paranemo Activity?
- #482635 The most Fiat advertising campaign ever
- #933741 What's the oldest thing you own and still use it?
🤔 how are these related with this topic? I'd like to know... Indeed, if I run a search I can find some more related posts:
- #66361 Thoughts on tipping older posts & evergreen content
- #990241 Golden Oldies #47
- #361591 Old posts in territories?
- #471053 Post the oldest photo on your phone
- #773926 Golden Oldies #20
- #665609 TIL “This Day on SN” are posts from the past
Golden Oldies appears twice, kudos to @siggy47! I'd also have expected to see more "This day on SN" in the search results. Maybe keywords do not match?
Even for authors seeking rent, I imagine that having their work in the LLMs' knowledge base is worth much more than the pennies they could charge for using it as training data.
For example, I recently asked ChatGPT to summarize book 1 of the Wheel of Time series because I had read it a long time ago but forgot most of it. Getting the summary and being able to ask a bunch of questions to bring myself up to speed made me more excited to read book 2.
Hold your own keys!
It makes you sexy!