all

Benchmarks

Task Name	Dataset Name	SOTA Result
Intent Detection	All Posts and Comment Mean	Mean Score68.51	42
Legal Contract Revision	ALL Avg	CQ Score86.87	25
Mathematical Reasoning	All Average	Accuracy60	20
Machine Translation (English to Hindi)	All weighted average (test)	BLEU Score0.0675	14
Machine Translation (Hindi to English)	All weighted average (test)	BLEU Score10.28	14
Point Cloud Quality Assessment	ALL	PLCC0.913	12
Machine Translation	ALL Average of two language pairs in four directions wmt22-comet-da	COMET85.8	12
Organ Segmentation	All 121 classes v1 (test)	DSC90.49	10
Generative Searching	All-50K (test)	HR@18.8	9
Word Sense Disambiguation	ALL (test)	F1 Score82	8
Image Classification	ALL (Hold-Out)	AUC93	7
Binary Graph Classification	All 169 Graphs (5-fold stratified CV)	Accuracy (Test)75.9	6
Word Sense Linking	ALL FULL	Precision80.4	5
Video Action Recognition	All (Avg.)	Base Score65.5	5
Tabular Data Generation	All Default, Shoppers, Adult	Memory Ratio Improvement (%)13.47	4
Wide-angle portrait correction	all (test)	Line Accuracy66.784	4
Mechanism design	All 31 shapes	Mean Chamfer Distance2.08	3
Aggregate Performance	All Average	Accuracy40.3	3
Word Sense Linking	ALL FULL (test)	Precision80.4	3
Anomaly Detection	All MVTec-AD, VisA, MPDD, BTAD combined	I-AUROC95.4	2
Classification Calibration	All 6 tabular	ΔNLL (%)0.49	1
Decision Making	All Aggregated (UK-based participants)	Final Accuracy5.2	1

Showing 22 of 22 rows