Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Comparison on Loong Set 3: 100K–200K Tokens
Loading...
57.84
LLM Score
Disco-RAG
26.1824
34.4012
42.62
50.8388
Jan 7, 2026
LLM Score
EM
Updated 4d ago
Evaluation Results
Method
Method
Links
LLM Score
EM
Disco-RAG
Base Model=Llama-3.3-70B
2026.01
57.84
28
StructRAG
Condition=SOTA Results
2026.01
57.74
35
Disco-RAG
Base Model=Qwen2.5-72B
2026.01
56.89
19
Disco-RAG
Base Model=Llama-3.1-8B
2026.01
55.8
11
RQ-RAG
Condition=SOTA Results
2026.01
44.62
0
Llama-3.3-70B
Condition=Standard RAG
2026.01
43.7
6
Qwen2.5-72B
Condition=Standard RAG
2026.01
41.83
4
Llama-3.3-70B
Condition=Full Context
2026.01
41.11
14
Llama-3.1-8B
Condition=Standard RAG
2026.01
40.24
3
Qwen2.5-72B
Condition=Full Context
2026.01
40.13
13
Llama-3.1-8B
Condition=Full Context
2026.01
37.43
12
GraphRAG
Condition=SOTA Results
2026.01
27.4
0
Feedback
Search any
task
Search any
task