Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Alignment on HH-RLHF 300 prompts
Loading...
69.8
Win/Tie Rate vs Vanilla (GPT-4o)
CARDS
49.208
54.554
59.9
65.246
Nov 5, 2025
Win/Tie Rate vs Vanilla (GPT-4o)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win/Tie Rate vs Vanilla (GPT-4o)
CARDS
Model=mistral-7b
2025.11
69.8
CARDS
Model=llama-7b
2025.11
64.5
STARS
Model=mistral-7b
2025.11
64.5
DPO
Model=mistral-7b
2025.11
60.5
STARS
Model=llama-7b
2025.11
60.2
Tree-bon
Model=mistral-7b
2025.11
59.2
RAIN
Model=mistral-7b
2025.11
59
ARGS
Model=mistral-7b
2025.11
58.8
DPO
Model=llama-7b
2025.11
56.4
Tree-bon
Model=llama-7b
2025.11
55.2
RAIN
Model=llama-7b
2025.11
55
ARGS
Model=llama-7b
2025.11
54.8
Speculative-Decoding
Model=mistral-7b
2025.11
50.4
Speculative-Decoding
Model=llama-7b
2025.11
50.2
Vanilla LLM
Model=llama-7b
2025.11
50
Vanilla LLM
Model=mistral-7b
2025.11
50
Feedback
Search any
task
Search any
task