Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Chain-of-reasoning on Loong Set 2: 50K–100K Tokens
Loading...
58.23
LLM Score
Disco-RAG
43.9404
47.6502
51.36
55.0698
Jan 7, 2026
LLM Score
Exact Match
Updated 4d ago
Evaluation Results
Method
Method
Links
LLM Score
Exact Match
Disco-RAG
Base Model=Llama-3.3-70B
2026.01
58.23
22
Disco-RAG
Base Model=Qwen2.5-72B
2026.01
57.22
20
Llama-3.3-70B
Condition=Standard RAG
2026.01
56.73
18
StructRAG
Condition=SOTA Results
2026.01
54.7
19
Qwen2.5-72B
Condition=Standard RAG
2026.01
53.28
16
Disco-RAG
Base Model=Llama-3.1-8B
2026.01
53.06
16
Llama-3.1-8B
Condition=Standard RAG
2026.01
50.42
15
Llama-3.3-70B
Condition=Full Context
2026.01
50.08
10
Qwen2.5-72B
Condition=Full Context
2026.01
47.69
11
RQ-RAG
Condition=SOTA Results
2026.01
47.6
10
GraphRAG
Condition=SOTA Results
2026.01
46.25
12
Llama-3.1-8B
Condition=Full Context
2026.01
44.49
11
Feedback
Search any
task
Search any
task