pull down to refresh

Debug-gym (available on GitHub and detailed in a blog post) is an environment that allows AI models to try and debug any existing code repository with access to debugging tools that aren't historically part of the process for these models. Microsoft found that without this approach, models are quite notably bad at debugging tasks. With the approach, they're better but still a far cry from what an experienced human developer can do.
Agents using debugging tools drastically outperformed those that didn't, but their success rate still wasn't high enough. Credit: Microsoft Research