Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pedagogical Dialogue Classification on MRBench (test)
Loading...
91
Mistake ID Acc
S5: HPO-FT
78.52
81.76
85
88.24
Dec 27, 2025
Mistake ID Acc
Mistake ID F1
Guidance Acc
Guidance F1
Macro F1
Updated 3mo ago
Evaluation Results
Method
Method
Links
Mistake ID Acc
Mistake ID F1
Guidance Acc
Guidance F1
Macro F1
S5: HPO-FT
architecture=Hierarchi...
2025.12
91
86
89
83
84.5
S4: HPO-Base
architecture=Hierarchi...
2025.12
90
84
87
81
82.5
GPT-4o
mode=Zero-shot, parame...
2025.12
88
82
85
80
81.2
S3: Unstructured
architecture=Unstructu...
2025.12
88
82
85
78
80
S2: Cooperative
architecture=Collabora...
2025.12
86
80
83
77
78.5
Llama-70B
backbone=Llama-3-70B
2025.12
85
78
81
74
76
S1: Single
backbone=Llama-3-70B,...
2025.12
79
71
76
68
69.5
Feedback
Search any
task
Search any
task