Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context language modeling on LongBench V2 (test)
Loading...
35.39
Acc (Short)
TRIM-KV
31.8852
32.7951
33.705
34.6149
Dec 3, 2025
Acc (Short)
Acc (Medium)
Acc (Easy)
Acc (Hard)
Avg Acc
Delta (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
Acc (Short)
Acc (Medium)
Acc (Easy)
Acc (Hard)
Avg Acc
Delta (%)
TRIM-KV
Backbone=Phi3-mini-128K
2025.12
35.39
20.93
34.44
28.74
30.68
6.56
Full KV
Backbone=Phi3-mini-128K
2025.12
33.71
18.6
34.44
25.86
28.79
0
LocRet
Backbone=Phi3-mini-128K
2025.12
32.02
19.78
26.67
28.74
28.03
-2.64
Feedback
Search any
task
Search any
task