Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Full-Information Online Learning on FOL Gaussian rewards, Horizon Generalization [T=15 -> T=25] 1.0
Loading...
57.36
Max LR
GPT-4o mini
38.8792
43.6771
48.475
53.2729
Nov 6, 2025
Max LR
Avg LR
Beta (β)
Updated 2d ago
Evaluation Results
Method
Method
Links
Max LR
Avg LR
Beta (β)
GPT-4o mini
Horizon (T)=25, Action...
2025.11
57.36
27.42
0.67
Trained GPT-4o mini
Horizon (T)=25, Action...
2025.11
53.29
25.63
0.62
FTRL
Horizon (T)=25, Action...
2025.11
39.59
27.21
0.64
Feedback
Search any
task
Search any
task