Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TRIP-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Interactive Agent TaskTRIP-Bench (Hard LIT)
Loose Success4,400
32
Interactive Agent TaskTRIP-Bench Overall
Loose Success Score45
16
Interactive Agent TaskTRIP-Bench Hard PMR
Loose Success36
16
Interactive Agent TaskTRIP-Bench (Hard FIT)
Loose Success Rate18
16
Interactive Agent TaskTRIP-Bench Mid
Loose Success Rate55
16
Interactive Agent TaskTRIP-Bench Easy
Loose Success Count71
16
Showing 6 of 6 rows