General Performance

Benchmarks

Dataset Name	SOTA Method	Metric
Overall Aggregate (test)		Average Score72.4	32	1mo ago
AlpacaEval		Winrate98	25	4mo ago
Aggregated Benchmarks	General-Reasoner	Overall Average49.76	22	1mo ago
VicunaEval		Winrate96.3	21	4mo ago
Aggregate Across Math, Code, Chat	DFlash	Speedup4.91	20	29d ago
Overall	UltraMix-190k	Overall Score62.05	19	4mo ago
General Evaluation Suite	Qwen3 8B	Accuracy73.8	17	4mo ago
Aggregated LLM Evaluation Suite	BTX	Average Score47.9	10	4mo ago
Performance Bench Reasoning & Knowledge	DeepSeek-R1-Distill-Qwen-14B (Reasoning)	Average Score78.37	9	4mo ago
Aggregated MMLU, HellaSwag, TruthfulQA, GSM8K, MATH, MBPP, HumanEval	Sens-Merging (DARE)	Average Score40.35	9	4mo ago

Showing 10 of 10 rows