Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Driving explanation classification on BDD-X Human 2
Loading...
79.88
Accuracy
RACE
65.4448
69.1924
72.94
76.6876
Feb 2, 2026
Accuracy
F1 Score (Macro)
Cohen's Kappa
Fleiss' Kappa
Agreement Level Category
Updated 3mo ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score (Macro)
Cohen's Kappa
Fleiss' Kappa
Agreement Level Category
RACE
Prompting Strategy=CoT-SC
2026.02
79.88
51
76
83
-
GPT-4o-mini
Prompting Strategy=CoT
2026.02
72.56
47
67
76
-
GPT-3.5-Turbo
Prompting Strategy=CoT
2026.02
70.24
43
65
74
-
GPT-4o
Prompting Strategy=CoT
2026.02
66
33
61
72
-
Feedback
Search any
task
Search any
task