Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Few-shot Text Classification26 few-shot tasks Random -> Random transfer setting (test)
Accuracy48.95
84
Few-shot Text Classification26 few-shot tasks Non-Class -> Class transfer setting (test)
Accuracy0.5275
84
Few-shot Text Classification26 few-shot tasks Class -> Non-Class transfer setting (test)
Accuracy45.54
84
Zero-shot Task Evaluationtasks 0-shot
Accuracy63.94
74
Zero-shot Evaluation7 tasks zero-shot
Mean Accuracy (Zero-shot)72.79
55
Zero-shot Classification5 zero-shot tasks
Accuracy79.48
55
Image Classification14 Tasks Merge
Average Accuracy94.3
51
Zero-shot EvaluationEight tasks zero-shot
Accuracy (Zero-shot)60.31
29
Zero-shot EvaluationZero-shot Tasks
Task Avg Score73.99
26
Zero-shot Task Evaluation11 Tasks zero-shot
0-shot Average68.66
26
Zero-shot EvaluationTasks Zero-shot (mean)
mAcc76.57
25
Vascular Image Segmentation11 tasks average
DSC83.83
13
Humanoid loco-manipulation66 unseen tasks (test)
Success Rate 181.58
10
Humanoid Loco-manipulation350 tasks (train)
Success Rate 188.76
10
Zero-shot Evaluation9 Zero-shot Tasks (BoolQ, HellaSwag, LAMBADA, OBQA, PIQA, SIQA, WinoGrande, ARC-Easy, ARC-Challenge)
0-shot Avg Accuracy0.7111
9
Zero-Shot Classification21 Tasks
21 Tasks Avg Score56.4
9
Instruction-driven 3D layout generation27 tasks across 9 scene types
Elo Rating1,866
4
Agent Selection300 tasks (evaluation)
Average Quality75.1
4
Robotic InsertionIn-Distribution Tasks Average
Success Rate86.67
3
Robot Manipulation5 Tasks Unseen Target Setup (train)
Task 1 (Cola Can) Approach100
3
Robot Manipulation5 Tasks Unseen Distractor Setup (train)
Task 1 (cola can) App Success100
3
Robot Manipulation5 Tasks Basic Setup (train)
Task 1 (cola can) Approach Rate100
3
Showing 22 of 22 rows