Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Binary Classification on Seetrue (test)
Loading...
83.67
Macro F1 Score
MT-RL-Judge
79.8636
80.8518
81.84
82.8282
Mar 12, 2026
Macro F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Macro F1 Score
MT-RL-Judge
training=Multi-task RL
2026.03
83.67
RL-Single
training=Single-task RL
2026.03
83.41
SFT-Unified
training=Unified multi...
2026.03
82.32
SFT-Single
training=Single-task SFT
2026.03
80.41
Off-the-shelf
2026.03
80.01
Feedback
Search any
task
Search any
task