Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Task-oriented Dialogue on τ-bench 157 scenarios

45.5Collaboration SR

GPT-4.1-mini

10.6619.70528.7537.795Sep 27, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
45.541.739.545.145.410091.686.898.999.8------------
2025.09
45.5----97.5----40.998.134.696.436.899.533.897.74097.738.193.8
2025.09
41.436.832.337.639.310088.97890.894.9------------
2025.09
38.9----87.8----40.992.444.495.839.297.943.39639.392.345.188.3
2025.09
27.926.620.424.830.110095.373.188.9107.9------------
2025.09
21.818.514.717.816.410084.967.481.775.2------------
2025.09
12106.88.8810083.356.772.566.7------------