Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Basic

Benchmarks

Task NameDataset NameSOTA ResultTrend
Robotic task planning and code generationBasic
Success Rate (SR)100
18
Question Answeringbasic (test)
IDK Score11.7
11
Node-level Regression Uncertainty QuantificationBasic
PICP100
9
Uncertainty QuantificationBasic Synthetic
Sharpness9.08
8
Concept ErasureBasic (test)
Unsafe Rate0
6
ADS TestingBasic
Execution Time (s)63.3
3
Showing 6 of 6 rows