Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Materials

Benchmarks

Task NameDataset NameSOTA ResultTrend
Scientific ReasoningMaterials
Average Score @16 (1h)78
16
Question AnsweringMaterials held-out (test)
BERTScore71.36
6
Outlier DetectionMaterials
Precision87.5
3
Cross-domain generalizationMaterials (test)
Accuracy99.3
3
Scientific Image AnalysisMaterials (test)
Improvement (%)4.3
2
Showing 5 of 5 rows