Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

held-out

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-path Speculative DecodingHeld-out (test)
Average Block Efficiency6.84
24
BargainingHeld-Out (test)
Reward0.7664
16
Tone MappingHeld-out (test)
PSNR40.59
6
Clinical case generationHeld-out (test)
BLEU-418.98
6
License Plate Recognitionheld-out (test)
Plate Accuracy92.3
5
binary classificationheld-out n=2,332 (test)
Accuracy99.61
4
Supply chain disruption forecastingHeld-out (test)
Brier Score0.0791
4
Showing 7 of 7 rows