People notice that while AI can now write programs, design websites, etc, it still often makes mistakes or goes in a wrong direction, and then they somehow jump to the conclusion that AI will never be able to do these tasks at human levels, or will only have a minor impact. When just a few years ago, having AI do these things was complete science fiction! Or they see two consecutive model releases and don’t notice much difference in their conversations, and they conclude that AI is plateauing and scaling is over.
Given consistent trends of exponential performance improvements over many years and across many industries, it would be extremely surprising if these improvements suddenly stopped. Instead, even a relatively conservative extrapolation of these trends suggests that 2026 will be a pivotal year for the widespread integration of AI into the economy:
- Models will be able to autonomously work for full days (8 working hours) by mid-2026.
- At least one model will match the performance of human experts across many industries before the end of 2026.
- By the end of 2027, models will frequently outperform experts on many tasks.
I sympathise with the way he calls out the current breathing spell in AI, where negativity seems to be the most common sentiment. Clearly, the usefulness of this tool is just beginning to seep into our society and we haven't seen anything yet. I have a feeling we will return to the hyperbolic extreme of "AI is going to change everything!!" soon enough. I also appreciated this fellow's gratuitous use of exclamation points, a style I call techno auptimist.
gpt-5
regressed againstgpt-4o
in instruction hierarchy - meaning that important instructions are ignored more often - and this has, despite being promised to be mitigated, in almost 2 months not been mitigated according to their own, current, system card. Especially cognitive rulesets, which is what we use in coding instructions, but also in the detection of malicious user messages on chatbots, regressed according to OpenAI themselves from a 73% success factor to 62% ingpt-5
. I've lightly observed similar outcomes inclaude-4-sonnet
vsclaude-3.7-sonnet
in a vibe coding setup.qwen-2.5
toqwen-3
, have in the meantime improved, although that mostly means that they have caught up with the bleeding edge a bit.