Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on TriviaQA 500 QA pairs
Loading...
55.4
Exact Match (EM)
Naive RAG
30.44
36.92
43.4
49.88
Oct 19, 2025
Exact Match (EM)
Updated 26d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
Naive RAG
Backbone=Qwen-2.5-14B-...
2025.10
55.4
Ft. Agent
Backbone=Qwen-2.5-7B-I...
2025.10
54.9
Base Agent
Backbone=Qwen-2.5-14B-...
2025.10
54.1
Base LLM
Backbone=Qwen-2.5-14B-...
2025.10
50.9
Ft. Agent
Backbone=Qwen-2.5-3B-I...
2025.10
49.5
Naive RAG
Backbone=Mistral-NeMo-...
2025.10
49.5
Naive RAG
Backbone=Qwen-2.5-7B-I...
2025.10
48.8
Naive RAG
Backbone=Qwen-2.5-3B-I...
2025.10
45.9
Base Agent
Backbone=Qwen-2.5-7B-I...
2025.10
45.5
Base Agent
Backbone=Mistral-NeMo-...
2025.10
43.1
Base LLM
Backbone=Mistral-NeMo-...
2025.10
41.5
Base LLM
Backbone=Qwen-2.5-7B-I...
2025.10
39.2
Base Agent
Backbone=Qwen-2.5-3B-I...
2025.10
35.7
Base LLM
Backbone=Qwen-2.5-3B-I...
2025.10
31.4
Feedback
Search any
task
Search any
task