Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-Context Question Answering on SQuAD 2K
Loading...
84.9
Answer F1 Score
IN-CONTEXT
19.484
36.467
53.45
70.433
Feb 6, 2026
Answer F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Answer F1 Score
IN-CONTEXT
2026.02
84.9
GEN ADAPTER
2026.02
39.9
SHINE
Training Data Scale=Fu...
2026.02
37.5
SHINE
Training Data Scale=1/...
2026.02
34.6
SHINE
Training Data Scale=1/...
2026.02
33.8
NAIVE
2026.02
22
Feedback
Search any
task
Search any
task