Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context Language Modeling on LongBench (MultiFieldQA, MuSiQue, GovReport 2023 test)
Loading...
32.18
MultiFieldQA Score
DroPE
2.9976
10.5738
18.15
25.7262
Dec 13, 2025
MultiFieldQA Score
MuSiQue Score
GovReport Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
MultiFieldQA Score
MuSiQue Score
GovReport Score
Average Score
DroPE
Base Model=SmolLM-1.7B...
2025.12
32.18
753
24.77
21.49
YaRN
Base Model=SmolLM-1.7B
2025.12
27.6
390
17.19
16.23
RoPE-NTK
Base Model=SmolLM-1.7B
2025.12
27.58
337
24.65
18.53
DroPE
Base Model=Llama2-7B,...
2025.12
25.9
1,288
39.47
26.08
YaRN
Base Model=Llama2-7B
2025.12
23.13
765
26.65
19.14
RoPE-NTK
Base Model=Llama2-7B
2025.12
21.81
1,091
32.91
21.88
Base
Base Model=Llama2-7B
2025.12
17.26
1,043
32.41
20.03
Base
Base Model=SmolLM-1.7B
2025.12
4.12
50
4.7
3.11
Feedback
Search any
task
Search any
task