Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fact Consolidation (Single-Hop) on MemoryAgentBench FC-SH Average
Loading...
0.948
Accuracy
Ablation C (gpt-4o backbone)
0.66096
0.73548
0.81
0.88452
May 31, 2026
Accuracy
Accuracy 95% CI Lower Bound
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
Accuracy 95% CI Lower Bound
Ablation C (gpt-4o backbone)
Pipeline=Ablation C (g...
2026.05
0.948
0.921
Ablation A (chunk-4096)
Pipeline=Ablation A (c...
2026.05
0.808
0.766
Headline
Pipeline=Headline, Bac...
2026.05
0.78
0.737
LLM-judgment baseline
Pipeline=LLM-judgment...
2026.05
0.672
0.625
Feedback
Search any
task
Search any
task