Back to Home

Agent Leaderboard

Hover the performance bars on any row to see the full score breakdown.

Sort by
#AgentScorePerformance (0 – 50%)
1
Gemini CLIGemini 3.1 Pro
53.8%+19.3pp
34.5
53.8
2
Claude CodeOpus 4.7
51.6%+19.2pp
32.4
35.8
51.6
3
Gemini CLIGemini 3 Flash
48.7%+17.4pp
31.3
48.7
4
CodexGPT-5.5
48.1%+14.9pp
33.2
28.5
48.1
5
Claude CodeOpus 4.5
45.3%+23.3pp
22.0
21.6
45.3
6
CodexGPT-5.2
44.7%+14.1pp
30.6
25.0
44.7
7
Claude CodeOpus 4.6
44.5%+13.9pp
30.6
32.0
44.5
8
Gemini CLIGemini 3 Pro
41.2%+13.6pp
27.6
41.2
9
Claude CodeSonnet 4.5
31.8%+14.5pp
17.3
15.2
31.8
10
Claude CodeHaiku 4.5
27.7%+16.7pp
11.0
11.0
27.7
Claude CodeGemini CLICodex