Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

In-distribution

Benchmarks

Task NameDataset NameSOTA ResultTrend
Error DetectionIn-distribution (test)
AUC0.8916
40
Mathematical ReasoningIn-Distribution Avg
Average Score45.6
29
Debiasing EffectivenessIn-Distribution (ID)
Mean Effectiveness Score (ID)10.2
16
Reasoning step reductionIn-Distribution 5K corpus (test)
Savings Rate47.5
9
Text-to-SpeechIn-distribution ID (test)
MOS3.87
5
Metasurface inverse designIn-Distribution (test)
SG74
2
Showing 6 of 6 rows