AI Benchmarks
AI benchmark rankings, model scores, and performance data.
Track live AI benchmark rankings, coding scores, math scores, and benchmark results across leading models from OpenAI, Anthropic, Google, Meta, DeepSeek, and more.
24 of 24 models
| # | Model | Org | Intelligence | Coding | Math | MMLU Pro | GPQA | LiveCodeBench | AIME 2025 | MATH 500 | SciCode | IFBench | HLE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) | Anthropic | 59.9 | 76.5 | — | — | 9260.0 | — | — | — | 6020.0 | 6346.9 | 5330.0 |
| 2 | Claude Opus 4.8 (Adaptive Reasoning, Max Effort) | Anthropic | 55.7 | 74.3 | — | — | 9200.0 | — | — | — | 5350.0 | 6224.5 | 4570.0 |
| 3 | GPT-5.5 (xhigh) | OpenAI | 54.8 | 74.9 | — | — | 9350.0 | — | — | — | 5610.0 | 7585.0 | 4430.0 |
| 4 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | Anthropic | 53.5 | 73.6 | — | — | 9140.0 | — | — | — | 5450.0 | 5863.9 | 3960.0 |
| 5 | GPT-5.5 (high) | OpenAI | 53.1 | 71.6 | — | — | 9320.0 | — | — | — | 5590.0 | 7163.3 | 4300.0 |
| 6 | GPT-5.4 (xhigh) | OpenAI | 51.4 | 71.1 | — | — | 9200.0 | — | — | — | 5660.0 | 7394.6 | 4160.0 |
| 7 | GLM-5.2 (max) | Z AI | 51.1 | 68.8 | — | — | 8950.0 | — | — | — | 5050.0 | 7333.3 | 4010.0 |
| 8 | GPT-5.5 (medium) | OpenAI | 50.4 | 71.5 | — | — | 9260.0 | — | — | — | 5350.0 | 7095.2 | 4060.0 |
| 9 | Gemini 3.5 Flash (high) | 50.2 | 70.1 | — | — | 9220.0 | — | — | — | 5310.0 | 7632.7 | 4100.0 | |
| 10 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 47.2 | 63.0 | — | — | 8750.0 | — | — | — | 4680.0 | 5659.9 | 3000.0 |
| 11 | Gemini 3.1 Pro Preview | 46.5 | 68.8 | — | — | 9410.0 | — | — | — | 5890.0 | 7714.3 | 4470.0 | |
| 12 | Qwen3.7 Max | Alibaba | 46.0 | 66.0 | — | — | 9230.0 | — | — | — | 4880.0 | 8054.4 | 3810.0 |
| 13 | Gemini 3.5 Flash (medium) | 45.4 | — | — | — | 9210.0 | — | — | — | 5300.0 | 7455.8 | 3990.0 | |
| 14 | MiniMax-M3 | MiniMax | 44.4 | 58.6 | — | — | 9290.0 | — | — | — | 4540.0 | 8285.7 | 3710.0 |
| 15 | GPT-5.3 Codex (xhigh) | OpenAI | 44.3 | — | — | — | 9150.0 | — | — | — | 5320.0 | 7537.4 | 3990.0 |
| 16 | DeepSeek V4 Pro (Reasoning, Max Effort) | DeepSeek | 44.3 | 59.4 | — | — | 8880.0 | — | — | — | 5000.0 | 7646.3 | 3590.0 |
| 17 | Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | 43.7 | — | — | — | 8960.0 | — | — | — | 5190.0 | 5312.9 | 3670.0 |
| 18 | GPT-5.5 (low) | OpenAI | 43.5 | 60.9 | — | — | 9100.0 | — | — | — | 5160.0 | 6435.4 | 3100.0 |
| 19 | Muse Spark | Meta | 43.1 | 58.6 | — | — | 8840.0 | — | — | — | 5150.0 | 7591.8 | 3990.0 |
| 20 | Kimi K2.6 | Kimi | 42.8 | 56.0 | — | — | 9110.0 | — | — | — | 5350.0 | 7598.6 | 3590.0 |
| 21 | Claude Opus 4.7 (Non-reasoning, High Effort) | Anthropic | 42.7 | — | — | — | 8850.0 | — | — | — | 5010.0 | 4360.5 | 3120.0 |
| 22 | MiMo-V2.5-Pro | Xiaomi | 42.2 | 60.2 | — | — | 8660.0 | — | — | — | 5020.0 | 7986.4 | 3380.0 |
| 23 | GPT-5.2 (xhigh) | OpenAI | 42.2 | — | 99.0 | 8740.0 | 9030.0 | 8890.0 | 9900.0 | — | 5210.0 | 7544.2 | 3540.0 |
| 24 | Kimi K2.7 Code | Kimi | 41.9 | 60.8 | — | — | 8960.0 | — | — | — | 4750.0 | 6312.9 | 3280.0 |