Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Document-grounded Question Answering on Long-Context Noise Dataset (800 samples) (test)
Loading...
296
AR
CIP
24.56
95.03
165.5
235.97
Dec 12, 2025
AR
CCS
Updated 4d ago
Evaluation Results
Method
Method
Links
AR
CCS
CIP
Backbone Model=GPT-4o,...
2025.12
296
40
RAG Enhancement
Backbone Model=GPT-4o,...
2025.12
51
12
CoT
Backbone Model=GPT-4o,...
2025.12
42
8
Direct Prompting
Backbone Model=GPT-4o,...
2025.12
35
5
Feedback
Search any
task
Search any
task