Helpfulness evaluation

Benchmarks

Dataset Name	SOTA Method	Metric
SHP	Self-Examination	Helpfulness Score89	40	18d ago
InstructionFollow		Accuracy63.2	32	2mo ago
Manual Evaluation Set	MSelf-MOA	Average Helpfulness Score4.57	24	4mo ago
Alpaca		Helpfulness Score90	20	18d ago
MTBench	GPT-4o	Helpfulness9.35	18	2mo ago
Figstep-audio Harmful-Safe	SARSteer	BRR88.8	15	2mo ago
LINGUASAFE	SHARD	Win Rate73.3	14	1mo ago
DNA	SHARD	Win Rate70.1	14	1mo ago
AdvBench-audio Harmful-Safe	SARSteer	BRR Score86.83	12	2mo ago
LLaVA-Bench	LLaVA-RLHF	Conversation Score93.1	11	4mo ago
MM-Vet2 (test)		GPT-Eval Score54.4	10	3mo ago
HH-RLHF helpful (test)	DeAL	Helpfulness Fraction77	7	4mo ago
Helpfulness (evaluation set)	SafeDPO	Win Rate84.05	5	4mo ago
Pause-and-think B	pause-and-think (Ours)	Conciseness80.6	3	1mo ago
HHH (test)	DPO + OGPSA	HHH Score90.68	3	4mo ago

Showing 15 of 15 rows