Best-of-N evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
RewardBench v2	PC2-based LLM-as-a-Judge	Accuracy58.69		2	2mo ago
RMB	PC2-based LLM-as-a-Judge	Accuracy59.69		2	2mo ago

Showing 2 of 2 rows