They are also comparing how the ai does against each previous model. 
Of course it would be better. 

The PhDs they compare it with also have a base to start from, yet ChatGPT o1 outperforms them. The article adds nuance to this and the original cited statement so it's worth reading the actual article.

south_korea_ln

It worked because the ai had a base to start from.

Its not like it did all the calculations without guidance.

science

‘In awe’: scientists impressed by latest ChatGPT model o1

It worked because the ai had a base to start from. 
Its not like it did all the calculations without guidance.