Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

User Simulation Behavioral Alignment on tau2-bench Retail + Airline (test)

95.8HL Score

Humans

8.9631.50554.0576.595May 13, 2026
Updated 20d ago

Evaluation Results

MethodLinks
2026.05
95.862.37994.997.888.692.293.4
2026.05
5765.761.465.493.78454.874.5
2026.05
4132.336.742.785.343.833.251.2
2026.05
29.23431.637.175.949.832.948.9
2026.05
12.314.613.53288.753.454.457.1