SkillsBench
by MilkeyAI
Leaderboard
Blogs
Docs
Back to Home
Agent Leaderboard
Hover the performance bars on any row to see the full score breakdown.
Sort by
With Skills
Without Skills
Normalized Gain (g)
#
Agent
Score
Performance (0 – 50%)
1
Gemini CLI
Gemini 3.1 Pro
53.8%
+19.3pp
34.5
53.8
2
Claude Code
Opus 4.7
51.6%
+19.2pp
32.4
35.8
51.6
3
Gemini CLI
Gemini 3 Flash
48.7%
+17.4pp
31.3
48.7
4
Codex
GPT-5.5
48.1%
+14.9pp
33.2
28.5
48.1
5
Claude Code
Opus 4.5
45.3%
+23.3pp
22.0
21.6
45.3
6
Codex
GPT-5.2
44.7%
+14.1pp
30.6
25.0
44.7
7
Claude Code
Opus 4.6
44.5%
+13.9pp
30.6
32.0
44.5
8
Gemini CLI
Gemini 3 Pro
41.2%
+13.6pp
27.6
41.2
9
Claude Code
Sonnet 4.5
31.8%
+14.5pp
17.3
15.2
31.8
10
Claude Code
Haiku 4.5
27.7%
+16.7pp
11.0
11.0
27.7
Claude Code
Gemini CLI
Codex
·
Without Skills
Self-Gen
With Skills