Share your thoughts, 1 month free Claude Pro on usSee more

Question Answering on OpsEval Hard

85.9Accuracy (%)

Qwen3-Max-2025-09-23

Updated 2mo ago

Evaluation Results

Method	Links
Qwen3-Max-2025-09-23 2026.04		85.9
Deepseek-v3.2-exp 2026.04		85.6
GPT-5.2 2026.04		84.5
Qwen-Plus-2025-09-11 2026.04		84.2
Moonshot-Kimi-K2-Instruct 2026.04		83.8
Qwen3-Next-80b-a3b-Thinking 2026.04		83.8
OpsLLM-32B 2026.04		79.9
OpsLLM-14B 2026.04		79
Qwen2.5-32B-Instruct 2026.04		76.8
Qwen-Turbo-2025-07-15 2026.04		76
Qwen2.5-14B-Instruct 2026.04		73.3
R1-Distill-SRE-Qwen-32B-INT8 2026.04		66.6
OpsLLM-7B 2026.04		66.3
Qwen2.5-7B-Instruct 2026.04		66.1
aiops-qwen-4b 2026.04		58.7
R1-Distill-SRE-Qwen-7B 2026.04		35.8
Zhiyu-32B 2026.04		14.9