Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-Hop Fact-based Reasoning on MAB FC-SH 262K v3 (test)
Loading...
93
Accuracy
Headline pipeline
3.56
26.78
50
73.22
May 31, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Headline pipeline
Backbone=gpt-4o
2026.05
93
Headline pipeline
Backbone=gpt-4o-mini
2026.05
82
Ablation A
Backbone=gpt-4o-mini,...
2026.05
73
LLM-judgment baseline
Backbone=gpt-4o-mini
2026.05
61
GPT-4o (long-context)
Backbone=gpt-4o, Pipel...
2026.05
60
HippoRAG-v2 (best published)
Backbone=gpt-4o-mini
2026.05
54
BM25
Backbone=gpt-4o-mini,...
2026.05
48
Zep / Graphiti
Backbone=gpt-4o-mini,...
2026.05
7
Feedback
Search any
task
Search any
task