Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fact Consolidation on MAB FC-SH (262K context) v3 (full)
Loading...
93
Accuracy (SubEM)
SH-conflict (fact + Python max)
33.72
49.11
64.5
79.89
May 31, 2026
Accuracy (SubEM)
Gap vs Best Performance
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy (SubEM)
Gap vs Best Performance
SH-conflict (fact + Python max)
architecture=SH fact +...
2026.05
93
-
SH-conflict (fact + Python max)
architecture=SH fact +...
2026.05
82
-
SH-conflict (chunk4096 + Python max)
architecture=SH chunk4...
2026.05
73
-
GPT-4o
mode=long-context
2026.05
60
-33
HippoRAG-v2
retrieval=hippocampal...
2026.05
54
-39
BM25
retrieval=simple lexic...
2026.05
48
-45
GPT-4o-mini
mode=long-context FIFO
2026.05
45
-48
Claude-3.7-Sonnet
mode=long-context
2026.05
43
-50
GPT-4.1-mini
mode=long-context
2026.05
36
-57
Feedback
Search any
task
Search any
task