Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-Context Question Answering on SQuAD 512
Loading...
0.859
Answer F1
IN-CONTEXT
0.19444
0.36697
0.5395
0.71203
Feb 6, 2026
Answer F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Answer F1
IN-CONTEXT
2026.02
0.859
SHINE
Training Data Scale=Fu...
2026.02
0.534
SHINE
Training Data Scale=1/...
2026.02
0.509
SHINE
Training Data Scale=1/...
2026.02
0.492
GEN ADAPTER
2026.02
0.488
NAIVE
2026.02
0.22
Feedback
Search any
task
Search any
task