Pool

Benchmarks

Task Name	Dataset Name	SOTA Result
Hallucination Detection	Pool (test)	AuROC0.8598	12
Trajectory-conditioned Video Generation	Pool	PSNR35.64	6
Model Routing	Small Pool	Oracle Accuracy92	6
Model Routing	Small pool	Mean per-model AUC82.6	6
Content Localization	Pool HumanEdit and AiEdit average	Accuracy98.46	5
Speech Editing Detection	Pool HumanEdit and AiEdit average	Acc98.46	5
Trajectory-controlled video generation	Pool	Interaction Realism4.4	2

Showing 7 of 7 rows