Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregate Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak DefenseAggregate Benchmarks
Harmful Score1.06
21
Reasoning Performance AggregationAggregate Benchmarks Code Math
Code Component Average56.2
12
General EvaluationAggregate Benchmarks
Average Score69.26
12
Video frame interpolationAggregate Benchmarks Average
PSNR33.76
9
Multimodal EvaluationAggregate Benchmarks
Average Score100
3
Showing 5 of 5 rows