Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-Context Question Answering on SQuAD 1K
Loading...
85.1
Answer F1
IN-CONTEXT
19.476
36.513
53.55
70.587
Feb 6, 2026
Answer F1
Updated 4d ago
Evaluation Results
Method
Method
Links
Answer F1
IN-CONTEXT
2026.02
85.1
SHINE
Training Data Scale=Fu...
2026.02
44.5
GEN ADAPTER
2026.02
43
SHINE
Training Data Scale=1/...
2026.02
42.7
SHINE
Training Data Scale=1/...
2026.02
42.4
NAIVE
2026.02
22
Feedback
Search any
task
Search any
task