Code Weekly
Coding Benchmarks
AI 模型代码能力评测排名
数据采集: 2026-04-16· Arena 数据: 2026-04-16
Arena Coding
Claude Opus 4.6 Thinking
1548
SWE-bench
live-SWE-agent + Claude 4.5 Opus medium (20251101)
79.2%
Aider
gpt-5 (high)
88.0%
LiveCodeBench
O4-Mini (High)
87.3%
Arms Race
各厂商最强模型的 Arena Coding Elo 趋势 · 每日更新
Arena Coding
基于用户投票的代码能力 Elo 排名 · arena.ai/leaderboard/webdev· 数据更新: 2026-04-16
1.Claude Opus 4.6 Thinking1548
2.Claude Opus 4.61545
3.GLM 5.11537
4.Claude Sonnet 4.61524
5.Claude Opus 4.5 20251101 Thinking 32k1490
6.Claude Opus 4.5 202511011468
7.GPT-5.4 High (codex Harness)1457
8.Gemini 3.1 Pro Preview1454
9.Qwen3.6 Plus Preview1453
10.GLM 51440
SWE-bench
真实 GitHub issue 修复能力 · swebench.com
1.live-SWE-agent + Claude 4.5 Opus medium (20251101)79.2%
2.Sonar Foundation Agent + Claude 4.5 Opus79.2%
3.TRAE + Doubao-Seed-Code78.8%
4.live-SWE-agent + Gemini 3 Pro Preview (2025-11-18)77.4%
5.Atlassian Rovo Dev (2025-09-02)76.8%
6.EPAM AI/Run Developer Agent v20250719 + Claude 4 Sonnet76.8%
7.mini-SWE-agent + Claude 4.5 Opus (high reasoning)76.8%
8.ACoder76.4%
9.mini-SWE-agent + Gemini 3 Flash (high reasoning)75.8%
10.mini-SWE-agent + MiniMax M2.5 (high reasoning)75.8%
Aider
代码编辑通过率 · aider.chat· 数据更新: 2026-03-17
1.gpt-5 (high)88.0%
2.gpt-5 (medium)86.7%
3.o3-pro (high)84.9%
4.gemini-2.5-pro-preview-06-05 (32k think)83.1%
5.o3 (high)81.3%
6.gpt-5 (low)81.3%
7.grok-4 (high)79.6%
8.gemini-2.5-pro-preview-06-05 (default think)79.1%
9.o3 (high) + gpt-4.178.2%
10.Gemini 2.5 Pro Preview 05-0676.9%
LiveCodeBench
竞赛编程(LeetCode / Codeforces)· livecodebench.github.io
1.O4-Mini (High)87.3%
2.O3 (High)84.7%
3.O4-Mini (Medium)84.5%
4.DeepSeek-R1-052884.4%
5.Gemini-2.5-Pro-06-0584.3%
6.Gemini-2.5-Pro-05-0682.7%
7.OpenReasoning-Nemotron-32B81.0%
8.EXAONE-4.0-32B80.9%
9.Qwen3-235B-A22B80.4%
10.XBai-o4-medium80.1%
每日自动更新 · Arena · SWE-bench · Aider · LiveCodeBench