Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language InferenceCB
Accuracy98.2
118
ClassificationCB
Accuracy91.1
46
Natural Language InferenceCB SuperGLUE (test)
Accuracy91.43
33
Natural Language InferenceCB
Average Accuracy91
29
Natural Language InferenceCB val (test)
Accuracy94.6
19
CommitmentBankCB
Accuracy84.99
16
Natural language inferenceCB (test)
Accuracy89.3
13
Text ClassificationCB (test)
Macro-F164.6
10
Natural Language InferenceCB
Total Communication Time (10^3 s)5.43
9
Price NegotiationCB Human Interaction
Success Rate48.3
8
Price NegotiationCB User Simulation
Success Rate (SR)57.5
8
Natural Language InferenceCB SuperGLUE (test dev)
Accuracy84
8
Natural Language InferenceCB
Accuracy87.5
8
Four-class classificationCB (evaluation set)
Precision59.08
8
Natural Language InferenceCB
F161.51
7
ClassificationCB UCI Repository (test)
Accuracy74.8
6
Natural language inferenceCB
Macro F1 Score0.537
6
Natural Language InferenceCB
Acc (0-shot)82.1
6
Natural Language InferenceCB (dev)
Accuracy0.84
6
Natural Language InferenceCB 32 samples
F1 Score86.5
6
Tabular ClassificationCB (test)
AUROC88
4
Showing 21 of 21 rows