Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MEBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-entity ReasoningMEBench Set3 (>100)
Comparison Accuracy94.6
5
Multi-entity ReasoningMEBench Set2 (11-100)
Comparison Accuracy95.2
5
Multi-entity ReasoningMEBench Set1 (0-10)
Comparison Accuracy96.8
5
Multi-entity ReasoningMEBench All sets
Comparison Acc93.4
5
Showing 4 of 4 rows