Aggregated LLM Evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
8 Standard Benchmarks Aggregate		Average Accuracy73.7		5	1mo ago
Balanced Objective Aggregate Suite	CAMEL	Weighted Average Score53.2		5	2mo ago

Showing 2 of 2 rows