Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

benchmark suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Offline Policy AdaptationFull Benchmark Suite Gravity and Morph shifts
Total Score820.8
7
Visual instruction tuningBenchmark Suite Aggregate
Average Score65.1
6
Quantum architecture compilation36-circuit benchmark suite Combined
Geometric-mean Execution Time19,507.5
4
Semantic Segmentation20-domain benchmark suite
h-mean58.05
4
Unique bug detectionFull benchmark suite Very Large
TP6
3
Showing 5 of 5 rows