Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Domain ReasoningLB
Accuracy60
23
General ReasoningLB V2
LB V2 Score27.42
14
General ReasoningLB V1
LB V1 Score74.98
14
Long-context evaluationLB v2 (ALL)
Accuracy (ALL)38
13
Showing 4 of 4 rows