Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on OpsEval Hard
Loading...
85.9
Accuracy (%)
Qwen3-Max-2025-09-23
12.06
31.23
50.4
69.57
Apr 6, 2026
Accuracy (%)
Updated 28d ago
Evaluation Results
Method
Method
Links
Accuracy (%)
Qwen3-Max-2025-09-23
LLM Category=Closed-so...
2026.04
85.9
Deepseek-v3.2-exp
LLM Category=Open-sour...
2026.04
85.6
GPT-5.2
LLM Category=Closed-so...
2026.04
84.5
Qwen-Plus-2025-09-11
LLM Category=Closed-so...
2026.04
84.2
Moonshot-Kimi-K2-Instruct
LLM Category=Open-sour...
2026.04
83.8
Qwen3-Next-80b-a3b-Thinking
LLM Category=Open-sour...
2026.04
83.8
OpsLLM-32B
LLM Category=OpsLLM
2026.04
79.9
OpsLLM-14B
LLM Category=OpsLLM
2026.04
79
Qwen2.5-32B-Instruct
LLM Category=Base LLM
2026.04
76.8
Qwen-Turbo-2025-07-15
LLM Category=Closed-so...
2026.04
76
Qwen2.5-14B-Instruct
LLM Category=Base LLM
2026.04
73.3
R1-Distill-SRE-Qwen-32B-INT8
LLM Category=Open-sour...
2026.04
66.6
OpsLLM-7B
LLM Category=OpsLLM
2026.04
66.3
Qwen2.5-7B-Instruct
LLM Category=Base LLM
2026.04
66.1
aiops-qwen-4b
LLM Category=Open-sour...
2026.04
58.7
R1-Distill-SRE-Qwen-7B
LLM Category=Open-sour...
2026.04
35.8
Zhiyu-32B
LLM Category=Open-sour...
2026.04
14.9
Feedback
Search any
task
Search any
task