Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Document-grounded Question Answering on Long-Context Noise Dataset (800 samples) (test)
Loading...
296
AR
CIP
24.56
95.03
165.5
235.97
Dec 12, 2025
AR
CCS
Updated 1mo ago
Evaluation Results
Method
Method
Links
AR
CCS
CIP
Backbone Model=GPT-4o,...
2025.12
296
40
RAG Enhancement
Backbone Model=GPT-4o,...
2025.12
51
12
CoT
Backbone Model=GPT-4o,...
2025.12
42
8
Direct Prompting
Backbone Model=GPT-4o,...
2025.12
35
5
Feedback
Search any
task
Search any
task