Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

P3

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language UnderstandingP3 v1 (unseen)
RTE Accuracy80.83
11
Constrained Bayesian OptimizationP3
Log10 Median Utility Gap1.28
10
Investment decision alignmentP3 v1 (test)
Overall MSE1.59
6
Word Sense DisambiguationP3
WiC Score53.3
5
Coreference ResolutionP3
Winogrades Score61.6
5
Sentence CompletionP3
COPA Accuracy85.3
5
Natural Language InferenceP3
RTE81.3
5
Multiple-Choice Question AnsweringP3
Dream77.6
5
SummarizationP3
Mul. News Score7.8
5
Sentiment AnalysisP3
Emotion Accuracy49.4
5
Minimal Problem SolvingP3.5P focal
Template Size (R×C)2,043
4
Showing 11 of 11 rows