Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Persuasive Dialogue on ToM-BPD interactive evaluation

55.23Win Rate: Identification

Qwen3-8B + TTBYS vs. GPT-5

34.055639.552845.0550.5472May 21, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.05
55.2326.4734.5625.1242.7826.3135.4429.1828.6231.04
2026.05
45.1929.4437.3325.7842.5136.2933.2227.6530.1128.47
2026.05
34.8750.1230.4528.3332.2138.7632.5828.1429.0430.22