Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Overrefusal Evaluation on JBench-B
Loading...
92
RR
Db as Alpaca
6.72
28.86
51
73.14
Mar 12, 2026
RR
Updated 1mo ago
Evaluation Results
Method
Method
Links
RR
Db as Alpaca
Setting / Model=P-SFT,...
2026.03
92
Db as Alpaca
Setting / Model=RLVR,...
2026.03
67
Baseline
Setting / Model=P-SFT,...
2026.03
56
Db as Our Data
Setting / Model=P-SFT,...
2026.03
39
Db as Our Data
Setting / Model=RLVR,...
2026.03
18
Baseline
Setting / Model=RLVR,...
2026.03
10
Feedback
Search any
task
Search any
task