Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Interactive Agent Task on TRIP-Bench (Hard LIT)

4,400Loose Success

GPT-5.2

-1761,0122,2003,388Feb 2, 2026
Updated 3mo ago

Evaluation Results

MethodLinks
2026.02
4,4001,400
2026.02
3,600200
2026.02
2,8000
2026.02
1,600200
2026.02
1,6000
2026.02
1,2000
2026.02
1,0000
2026.02
1,0000
2026.02
1,0000
2026.02
1,0000
2026.02
8000
2026.02
6000
2026.02
260
2026.02
260
2026.02
220
2026.02
200
2026.02
180
2026.02
160
2026.02
120
2026.02
100
2026.02
80
2026.02
20
2026.02
20
2026.02
00
2026.02
00
2026.02
00
2026.02
00
2026.02
00
2026.02
00
2026.02
00
2026.02
00
2026.02
00