Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Question Answering on LongBench CodeQA v2
Loading...
0.741
Accuracy
SRLM (no sub-calls)
0.17836
0.32443
0.4705
0.61657
Mar 7, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
SRLM (no sub-calls)
Backbone=GPT-5
2026.03
0.741
SRLM
Backbone=GPT-5
2026.03
0.689
RLM (no sub-calls)
Backbone=GPT-5
2026.03
0.652
SRLM
Backbone=Qwen3-Coder-480B
2026.03
0.649
RLM
Backbone=Qwen3-Coder-480B
2026.03
0.598
RLM
Backbone=GPT-5
2026.03
0.595
SRLM (no sub-calls)
Backbone=Qwen3-Coder-480B
2026.03
0.59
Summary agent
Backbone=GPT-5
2026.03
0.58
RLM (no sub-calls)
Backbone=Qwen3-Coder-480B
2026.03
0.538
Summary agent
Backbone=Qwen3-Coder-480B
2026.03
0.5
CodeAct (+ sub-calls)
Backbone=Qwen3-Coder-480B
2026.03
0.26
CodeAct (+ BM25)
Backbone=Qwen3-Coder-480B
2026.03
0.24
Base Model
Backbone=GPT-5
2026.03
0.24
CodeAct (+ sub-calls)
Backbone=GPT-5
2026.03
0.24
CodeAct (+ BM25)
Backbone=GPT-5
2026.03
0.22
Base Model
Backbone=Qwen3-Coder-480B
2026.03
0.2
Feedback
Search any
task
Search any
task