Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Chain-of-reasoning on Loong Set 4: 200K–250K Tokens
Loading...
36.17
LLM Score
Disco-RAG
26.3836
28.9243
31.465
34.0057
Jan 7, 2026
LLM Score
EM
Updated 4d ago
Evaluation Results
Method
Method
Links
LLM Score
EM
Disco-RAG
Base Model=Qwen2.5-72B
2026.01
36.17
0.06
Disco-RAG
Base Model=Llama-3.3-70B
2026.01
36.06
0.06
Disco-RAG
Base Model=Llama-3.1-8B
2026.01
36
0.03
StructRAG
Condition=SOTA Results
2026.01
35.71
0.05
RQ-RAG
Condition=SOTA Results
2026.01
34.69
0
GraphRAG
Condition=SOTA Results
2026.01
33.67
0.33
Llama-3.3-70B
Condition=Standard RAG
2026.01
31.33
0.02
Llama-3.3-70B
Condition=Full Context
2026.01
30.17
0
Qwen2.5-72B
Condition=Standard RAG
2026.01
30.02
0.01
Llama-3.1-8B
Condition=Standard RAG
2026.01
29.92
0
Qwen2.5-72B
Condition=Full Context
2026.01
28.48
0
Llama-3.1-8B
Condition=Full Context
2026.01
26.76
0
Feedback
Search any
task
Search any
task