We make a plan to win high score on the SWE-bench Verified benchmark.
We pull the 500 samples into a web UI for easy inspection -- super smooth thanks to Convex.dev! -- then decide to focus first on the psf/requests repo.
Next we index!
Watch on X:
this territory is moderated