Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-domain aggregate

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Multi-domain ReasoningMulti-Domain Aggregate
Average Score73.43
7
Model CalibrationMulti-domain aggregate
Agreement23.2
5
Showing 2 of 2 rows