Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Alignment benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Alignment
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
MT-Bench
Qwen2-72B-Instruct
MT-Bench Score
9.12
49
1mo ago
Weather
MemPrompt
Align
83.22
40
3mo ago
IFEval strict prompt
Nemotron Cascade-8B
pass@1
90.2
26
15d ago
UltraFeedback (test)
MetaAligner-7B
Honesty Score
63.72
20
1d ago
HH-RLHF
AdaBoN
Estimated Score (EST)
154
12
2mo ago
TruthfulQA MC2
Qwen3-14B
Score
77.72
10
15d ago
HalluBench
EVE (Ours-8B-iter4)
Accuracy
63
10
1mo ago
IFEval
Qwen2.5-VL-72B
IFEval Score
86.3
10
3mo ago
XSTest
CE + TMKL
Refusal Delta
0.2
9
5d ago
MIA-Bench
EVE (Ours-8B-iter4)
Accuracy
93.3
7
1mo ago
IFBench
Nemotron-Cascade 14B-Thinking
pass@1
41.7
7
3mo ago
ArenaHard
Gemini-2.5 Flash-Thinking
pass@1
95.7
7
3mo ago
Human Values
MAH-DPO
Helpfulness Score
0.7165
6
1d ago
PDDLLM v1 (test)
Expert
Planning Success Rate
100
6
3mo ago
Berlin2-10 real (test)
GVINS
MAE
1.66
5
3mo ago
Berlin2 real (test)
SDP
MAE
1.29
5
3mo ago
Berlin1-10 real (test)
SDP
MAE
4.53
5
3mo ago
Berlin real 1 (test)
SDP
MAE
2.64
5
3mo ago
Arena-Hard
Qwen2-72B-Instruct
Score
48.1
5
3mo ago
MixEval
Qwen2-72B-Instruct
Score
86.7
5
3mo ago
AlignBench v1 (test)
Qwen2-7B
Score
7.21
5
3mo ago
MT-Bench v1 (test)
Qwen2-7B
MT-Bench Score
8.41
5
3mo ago
Arena-Hard
SnapMLA
Hard Prompt Gemini Score
70.4
4
3mo ago
AlignBench
Qwen2-72B-Instruct
Score
8.27
4
3mo ago
MixEval v1 (test)
Qwen2-7B
Accuracy
76.5
4
3mo ago
Showing 25 of 28 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs