Share your thoughts, 1 month free Claude Pro on usSee more

General

Benchmarks

Task Name	Dataset Name	SOTA Result
General Alignment	General Alignment and Coherence	Alignment Score93.52	18
Machine Translation	general 2023 (test)	BLEU32.64	16
Image Manipulation Detection	General Inference Speed Evaluation Images	FPS31.7	16
Instance Erasure	General	FID (General)13.24	13
Training Efficiency Analysis	General (train)	Relative Cost1	12
Stability	General (MMLU, BBH, TyDiQA, BoolQ, PIQA, GSM8K)	General Score55.75	9
Video Compression	General	Parameters (M)18.34	9
Segmentation	General Efficiency Evaluation	Latency (ms)7.3	9
Underwater Image Enhancement	General Architectural Comparison 1.0 (UEIB-T90)	PSNR22.82	8
General Vision-Language Understanding	General	Avg Score72.4	8
Average evaluation across 7 tasks	General (test)	BERTScore76.5	8
Colon Polyp Segmentation	General	Parameters (M)32.55	8
OOD Detection	General 300 samples (test)	Latency (µs)3.2	7
Computational Complexity Analysis	General Model Complexity	Parameters91,371	7
360-degree video saliency prediction	General	Params (M)3.7	7
Circuit Localization	General	CPR2.13	6
Model Efficiency Analysis	General 16 frames, 512 text tokens (inference)	FPS20.74	6
Interactive Segmentation	General Efficiency Benchmarking	Parameters (MB)84.89	6
Optical Flow Estimation	General Architecture Evaluation	Parameters (M)0.074	5
Optimizer Property Comparison	General Theoretical Analysis	FLOPs per Step1	5
Novel View Synthesis	General	MFLOPs / Pixel13.77	5
Ending event prediction	General (test)	MRR0.401	5
Speech Recognition	General Throughput Evaluation	Throughput (tokens/s)168.9	4
Interactive World Modeling	General	Metric-	0
Distributed Optimization	General First-order optimization setting	Metric-	0

Showing 25 of 29 rows