I'm not really clear on how the supposed reasoning capabilities are supposed to work in these LLMs. I understand the basics, but those would only work when it has been trained on similar problems before. Not because it understands them, but because it is good at predicting what is the likely next step based on its training data. Hard math are supposed to be one-of-a-kind solutions that likely do not exist in the available training data.