Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language InferenceCB
Accuracy98.2
129
ClassificationCB
Accuracy91.1
70
Natural Language InferenceCB SuperGLUE (test)
Accuracy91.43
33
Natural Language InferenceCB
Average Accuracy91
29
Natural Language InferenceCB
Loss0.03
20
Natural Language InferenceCB val (test)
Accuracy94.6
19
CommitmentBankCB
Accuracy84.99
16
Natural language inferenceCB (test)
Accuracy89.3
13
Text ClassificationCB (test)
Macro-F164.6
10
Natural Language InferenceCB
Total Communication Time (10^3 s)5.43
9
Thermal Image RestorationCB EN
MUSIQ Score70.89
8
Price NegotiationCB Human Interaction
Success Rate48.3
8
Price NegotiationCB User Simulation
Success Rate (SR)57.5
8
Natural Language InferenceCB SuperGLUE (test dev)
Accuracy84
8
Natural Language InferenceCB
Accuracy87.5
8
Four-class classificationCB (evaluation set)
Precision59.08
8
Natural Language InferenceCB
F161.51
7
ClassificationCB UCI Repository (test)
Accuracy74.8
6
Natural language inferenceCB
Macro F1 Score0.537
6
Natural Language InferenceCB
Acc (0-shot)82.1
6
Natural Language InferenceCB (dev)
Accuracy0.84
6
Natural Language InferenceCB 32 samples
F1 Score86.5
6
Tabular ClassificationCB (test)
AUROC88
4
Showing 23 of 23 rows