We make a plan to win high score on the SWE-bench Verified benchmark.
We pull the 500 samples into a web UI for easy inspection -- super smooth thanks to Convex.dev! -- then decide to focus first on the psf/requests repo.
Next we index!
Watch on X: https://x.com/OpenAgentsInc/status/1823896252460704139