Share your thoughts, 1 month free Claude Pro on usSee more

SOTA LLM-as-a-Judge Robustness benchmarks and papers with code | Wizwand

Share your thoughts, 1 month free Claude Pro on usSee more

LLM-as-a-Judge Robustness

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
Sage (Hard)		Factuality (IPI)55.9		13	4mo ago
Sage Easy	Gemini-2.5-Pro	Factuality Error (IPI)0.059		13	4mo ago

Showing 2 of 2 rows