Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

KitchenAmbig

Benchmarks

Task NameDataset NameSOTA ResultTrend
Introspective PlanningKitchenAmbig (OOD)
Average Accuracy97.6
10
Introspective PlanningKitchenAmbig (In-Distribution)
Average Accuracy95.7
10
User-aligned task completionKitchenAmbig (OOD)
Accuracy87.3
3
User-aligned task completionKitchenAmbig (In-Distribution)
Accuracy84
3
Showing 4 of 4 rows