My informal testing (having detailed conversations about complex topics) reveals that it's still a bunch less good than both GPT-4 and also Claude. This is a particular use case, YMMV, but it's the one I care about.
reply
reply