Share your thoughts, 1 month free Claude Pro on usSee more

Pairwise Ranking on LFQA

77.24Pairwise Preference Accuracy

Claude-3.5 Sonnet

Updated 4mo ago

Evaluation Results

Method	Links
Claude-3.5 Sonnet 2024.12		77.24
GPT-4o 2024.12		76.54
LMUNIT LLaMA3.1-70B-Decomposed-Weighted 2024.12		76.53
LMUNIT LLaMA3.1-70B 2024.12		76.15
SFR-LLaMA-3.1-70B-Judge 2024.12		75
LMUNIT LLaMA3.1-70B-Decomposed 2024.12		74.62
Prometheus-2-8x7B 2024.12		74.23
Prometheus-2-7B 2024.12		72.31
Prometheus-2-BGB-8x7B 2024.12		71.54
LMUNIT LLaMA3.1-8B 2024.12		71.54
SFR-LLaMA-3.1-8B-Judge 2024.12		68.85
Skywork-Critic-Llama-3.1-8B 2024.12		64.23
Llama-3-OffsetBias-8B 2024.12		63.08