Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-horizon dialogue on LOCOMO
Loading...
46.85
Success Rate
MLMF
41.806
43.1155
44.425
45.7345
Mar 31, 2026
Success Rate
Context Usage
F1 Score
Updated 18d ago
Evaluation Results
Method
Method
Links
Success Rate
Context Usage
F1 Score
MLMF
Efficiency=10.4× speedup
2026.03
46.85
58.4
-
MLMF
Ref=MLMF, Efficiency=1...
2026.03
46.85
58.4
61.8
Hu et al.
Reference=[10], Effici...
2026.03
42
64.98
-
Hu et al.
Ref=[10], Efficiency=9×
2026.03
42
64.98
-
Feedback
Search any
task
Search any
task