Share your thoughts, 1 month free Claude Pro on usSee more

Tasks

Benchmarks

Task Name	Dataset Name	SOTA Result
Zero-shot Evaluation	7 tasks zero-shot	Mean Accuracy (Zero-shot)72.79	223
Few-shot Text Classification	26 few-shot tasks Random -> Random transfer setting (test)	Accuracy48.95	84
Few-shot Text Classification	26 few-shot tasks Non-Class -> Class transfer setting (test)	Accuracy0.5275	84
Few-shot Text Classification	26 few-shot tasks Class -> Non-Class transfer setting (test)	Accuracy45.54	84
Zero-shot Task Evaluation	tasks 0-shot	Accuracy68.58	83
Zero-shot Classification	5 zero-shot tasks	Accuracy79.48	55
Image Classification	14 Tasks Merge	Average Accuracy94.3	51
Zero-shot Evaluation	Eight tasks zero-shot	Accuracy (Zero-shot)60.31	29
Zero-shot Evaluation	Zero-shot Tasks	Task Avg Score73.99	26
Zero-shot Task Evaluation	11 Tasks zero-shot	0-shot Average68.66	26
Zero-shot Evaluation	Tasks Zero-shot (mean)	mAcc76.57	25
Vascular Image Segmentation	11 tasks average	DSC83.83	13
Humanoid loco-manipulation	66 unseen tasks (test)	Success Rate 181.58	10
Humanoid Loco-manipulation	350 tasks (train)	Success Rate 188.76	10
Zero-shot Evaluation	9 Zero-shot Tasks (BoolQ, HellaSwag, LAMBADA, OBQA, PIQA, SIQA, WinoGrande, ARC-Easy, ARC-Challenge)	0-shot Avg Accuracy0.7111	9
Zero-Shot Classification	21 Tasks	21 Tasks Avg Score56.4	9
Instruction-driven 3D layout generation	27 tasks across 9 scene types	Elo Rating1,866	4
Agent Selection	300 tasks (evaluation)	Average Quality75.1	4
Robotic Insertion	In-Distribution Tasks Average	Success Rate86.67	3
Robot Manipulation	5 Tasks Unseen Target Setup (train)	Task 1 (Cola Can) Approach100	3
Robot Manipulation	5 Tasks Unseen Distractor Setup (train)	Task 1 (cola can) App Success100	3
Robot Manipulation	5 Tasks Basic Setup (train)	Task 1 (cola can) Approach Rate100	3

Showing 22 of 22 rows