AI Benchmarks

AI benchmark rankings, model scores, and performance data.

Track live AI benchmark rankings, coding scores, math scores, and benchmark results across leading models from OpenAI, Anthropic, Google, Meta, DeepSeek, and more.

24 of 24 models

#	Model	Org	Intelligence ↓	Coding	Math	MMLU Pro	GPQA	LiveCodeBench	AIME 2025	MATH 500	SciCode	IFBench	HLE
1	Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)	Anthropic	59.9	76.5	—	—	9260.0	—	—	—	6020.0	6346.9	5330.0
2	Claude Opus 4.8 (Adaptive Reasoning, Max Effort)	Anthropic	55.7	74.3	—	—	9200.0	—	—	—	5350.0	6224.5	4570.0
3	GPT-5.5 (xhigh)	OpenAI	54.8	74.9	—	—	9350.0	—	—	—	5610.0	7585.0	4430.0
4	Claude Opus 4.7 (Adaptive Reasoning, Max Effort)	Anthropic	53.5	73.6	—	—	9140.0	—	—	—	5450.0	5863.9	3960.0
5	GPT-5.5 (high)	OpenAI	53.1	71.6	—	—	9320.0	—	—	—	5590.0	7163.3	4300.0
6	GPT-5.4 (xhigh)	OpenAI	51.4	71.1	—	—	9200.0	—	—	—	5660.0	7394.6	4160.0
7	GLM-5.2 (max)	Z AI	51.1	68.8	—	—	8950.0	—	—	—	5050.0	7333.3	4010.0
8	GPT-5.5 (medium)	OpenAI	50.4	71.5	—	—	9260.0	—	—	—	5350.0	7095.2	4060.0
9	Gemini 3.5 Flash (high)	Google	50.2	70.1	—	—	9220.0	—	—	—	5310.0	7632.7	4100.0
10	Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)	Anthropic	47.2	63.0	—	—	8750.0	—	—	—	4680.0	5659.9	3000.0
11	Gemini 3.1 Pro Preview	Google	46.5	68.8	—	—	9410.0	—	—	—	5890.0	7714.3	4470.0
12	Qwen3.7 Max	Alibaba	46.0	66.0	—	—	9230.0	—	—	—	4880.0	8054.4	3810.0
13	Gemini 3.5 Flash (medium)	Google	45.4	—	—	—	9210.0	—	—	—	5300.0	7455.8	3990.0
14	MiniMax-M3	MiniMax	44.4	58.6	—	—	9290.0	—	—	—	4540.0	8285.7	3710.0
15	GPT-5.3 Codex (xhigh)	OpenAI	44.3	—	—	—	9150.0	—	—	—	5320.0	7537.4	3990.0
16	DeepSeek V4 Pro (Reasoning, Max Effort)	DeepSeek	44.3	59.4	—	—	8880.0	—	—	—	5000.0	7646.3	3590.0
17	Claude Opus 4.6 (Adaptive Reasoning, Max Effort)	Anthropic	43.7	—	—	—	8960.0	—	—	—	5190.0	5312.9	3670.0
18	GPT-5.5 (low)	OpenAI	43.5	60.9	—	—	9100.0	—	—	—	5160.0	6435.4	3100.0
19	Muse Spark	Meta	43.1	58.6	—	—	8840.0	—	—	—	5150.0	7591.8	3990.0
20	Kimi K2.6	Kimi	42.8	56.0	—	—	9110.0	—	—	—	5350.0	7598.6	3590.0
21	Claude Opus 4.7 (Non-reasoning, High Effort)	Anthropic	42.7	—	—	—	8850.0	—	—	—	5010.0	4360.5	3120.0
22	MiMo-V2.5-Pro	Xiaomi	42.2	60.2	—	—	8660.0	—	—	—	5020.0	7986.4	3380.0
23	GPT-5.2 (xhigh)	OpenAI	42.2	—	99.0	8740.0	9030.0	8890.0	9900.0	—	5210.0	7544.2	3540.0
24	Kimi K2.7 Code	Kimi	41.9	60.8	—	—	8960.0	—	—	—	4750.0	6312.9	3280.0