Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Natural Language Inference on ContractNLI
Loading...
84.5
Macro-F1
Full DRO
67.444
71.872
76.3
80.728
Jun 16, 2025
Macro-F1
Updated 23d ago
Evaluation Results
Method
Method
Links
Macro-F1
Full DRO
Reward=DRO, Rollout-Gr...
2025.06
84.5
R3
Reward=R3, Rollout-Gro...
2025.06
80.8
R3
Reward=R3, Rollout-Gro...
2025.06
80.6
Avg Prob (RLPR)
Reward=Avg Prob (RLPR)...
2025.06
78.2
Rubric (RLER)
Reward=Rubric (RLER),...
2025.06
75.4
Avg Logprob (VeriFree)
Reward=Avg Logprob (Ve...
2025.06
73.6
RL-F1
Reward=RL-F1, Rollout-...
2025.06
73.4
Base
Reward=Base
2025.06
68.1
Feedback
Search any
task
Search any
task