Share your thoughts, 1 month free Claude Pro on usSee more

Classification on InfoBench

92.8Binary Accuracy

GPT-4o

Updated 4mo ago

Evaluation Results

Method	Links
GPT-4o 2024.12		92.8
SFR-LLaMA-3.1-8B-Judge 2024.12		92.8
SFR-LLaMA-3.1-70B-Judge 2024.12		92.58
Claude-3.5 Sonnet 2024.12		91.58
LMUNIT LLaMA3.1-8B 2024.12		91.26
LMUNIT LLaMA3.1-70B 2024.12		89
LMUNIT LLaMA3.1-70B-Decomposed 2024.12		89
LMUNIT LLaMA3.1-70B-Decomposed-Weighted 2024.12		89
Prometheus-2-8x7B 2024.12		87.85
Prometheus-2-BGB-8x7B 2024.12		83.87
Llama-3-OffsetBias-8B 2024.12		72.15
Prometheus-2-7B 2024.12		48.6