Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text-to-SQLScience Benchmark
Execution Accuracy59.53
28
Named Entity RecognitionScience
F1 Score79.4
19
Task RoutingScience
Cost ($)0.0276
15
Taxonomy ExpansionScience single-parent hierarchies (test)
R@146.1
13
Multi-label ClassificationScience
Ranking Loss0.9277
11
Multi-label Feature SelectionScience
Macro F1 Score12.39
11
Multi-label Feature SelectionScience
CV Score25.122
11
Multi-label Feature SelectionScience (test)
HL3.44
11
Multi-label feature selectionScience
OE Score96
11
Multi-label Feature SelectionScience
AP5.26
11
Taxonomy ExpansionScience (SCI) SemEval-2016 Task 13
Chi-Squared13.2
10
Scientific ReasoningScience GPQA Diamond HLE (test)
GPQA Diamond Score63.1
6
Science ReasoningScience (out-of-distribution)
Accuracy65.12
6
Task-Efficient RoutingScience Curated Task Benchmark 1.0 (test)
Average Cost0.0054
3
Taxonomy ExpansionScience
Prec@144.7
3
Named Entity RecognitionScience English
F1 Score62.29
2
Showing 16 of 16 rows