Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregate Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
General EvaluationAggregate Benchmarks
Average Score93.9
37
Jailbreak DefenseAggregate Benchmarks
Harmful Score1.06
21
Multimodal EvaluationAggregate Benchmarks
Average Score100
16
Reasoning Performance AggregationAggregate Benchmarks Code Math
Code Component Average56.2
12
Performance Compression StabilityAggregate Benchmarks
Relative Performance100
10
Pruning Performance EvaluationAggregate Benchmarks Table 6
Relative Performance (%)100
10
Video frame interpolationAggregate Benchmarks Average
PSNR33.76
9
Referring Video Object Segmentation and Point-to-Mask TrackingAggregate Benchmarks (DAVIS, MeViS-U, REVOS)
Overall Score41
6
Showing 8 of 8 rows