SWE-bench Verified maxed out, and it's time to build your own private coding evals
OpenAI is moving on from SWE-bench Verified because the benchmark has degraded. It’s a harsh reminder that public leaderboards cannot replace private evaluations based on your actual codebase.