Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationMAB code
TS69.6
18
Fact ConsolidationMAB FC-SH (262K context) v3 (full)
Accuracy (SubEM)93
9
Single-Hop Fact-based ReasoningMAB FC-SH 262K v3 (test)
Accuracy93
8
Fact ConsolidationMAB FC-MH 262K context v3 (full)
Accuracy (SubEM)27
7
Multi-Hop Fact-based ReasoningMAB FC-MH, 262K v3 (test)
Accuracy27
5
Online Packet Scheduling with Deadlines (OPSD)MAB 1-bounded
Competitive Ratio1
2
epsilon-Best Arm IdentificationMAB Multi-task
Non-adaptive Upper Bound1
1
Regret MinimizationMAB with paid observations
Metric-
0
Showing 8 of 8 rows