It's cool to see The AI Replacement Hypothesis not playing out in this case. Optimistically, I suspect AI will work like this in most industries. Instead of replacing humans AI pushes humans to do more of the things that we value humans doing. At the limit, I suspect we have Jevon's Paradox: AI makes things more efficient, things get cheaper and more customers, and the human in the loop just serves more customers in the same amount of time.
Three things explain this. First, while models beat humans on benchmarks, the standardized tests designed to measure AI performance, they struggle to replicate this performance in hospital conditions. Most tools can only diagnose abnormalities that are common in training data, and models often don’t work as well outside of their test conditions. Second, attempts to give models more tasks have run into legal hurdles: regulators and medical insurers so far are reluctant to approve or cover fully autonomous radiology models. Third, even when they do diagnose accurately, models replace only a small share of a radiologist’s job. Human radiologists spend a minority of their time on diagnostics and the majority on other activities, like talking to patients and fellow clinicians.
The performance of a tool can drop as much as 20 percentage points when it is tested out of sample, on data from other hospitals. In one study, a pneumonia detection model trained on chest X-rays from a single hospital performed substantially worse when tested at a different hospital
Radiologists are useful for more than reading scans; a study that followed staff radiologists in three different hospitals in 2012 found that only 36 percent of their time was dedicated to direct image interpretation. More time is spent on overseeing imaging examinations, communicating results and recommendations to the treating clinicians and occasionally directly to patients, teaching radiology residents and technologists who conduct the scans, and reviewing imaging orders and changing scanning protocols. This means that, if AI were to get better at interpreting scans, radiologists may simply shift their time toward other tasks. This would reduce the substitution effect of AI.
Multi‑task foundation models may widen coverage, and different training sets could blunt data gaps. But many hurdles cannot be removed with better models alone: the need to counsel the patient, shoulder malpractice risk, and receive accreditation from regulators. Each hurdle makes full substitution the expensive, risky option and human plus machine the default.