Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Dialogue Annotation on TalkMoves (test)
Loading...
58
Cohen's Kappa
Sonnet 4.6 (RAG_FINETUNED_UTT)
26.28
34.515
42.75
50.985
Apr 3, 2026
Cohen's Kappa
Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Cohen's Kappa
Accuracy
Sonnet 4.6 (RAG_FINETUNED_UTT)
Model=Sonnet 4.6, Cond...
2026.04
58
70.8
GPT-5.2 (RAG_FINETUNED_UTT)
Model=GPT-5.2, Conditi...
2026.04
53.9
68.8
Qwen3-32b (RAG_FINETUNED_UTT)
Model=Qwen3-32b, Condi...
2026.04
52.6
66.7
Sonnet 4.6 (RAG_FINETUNED_CHUNK)
Model=Sonnet 4.6, Cond...
2026.04
50.1
64.7
Sonnet 4.6 (RAG_NO_FINETUNE)
Model=Sonnet 4.6, Cond...
2026.04
49.8
65.3
GPT-5.2 (RAG_NO_FINETUNE)
Model=GPT-5.2, Conditi...
2026.04
47.9
63.9
GPT-5.2 (RAG_FINETUNED_CHUNK)
Model=GPT-5.2, Conditi...
2026.04
45.4
62
Qwen3-32b (RAG_NO_FINETUNE)
Model=Qwen3-32b, Condi...
2026.04
44.2
61.7
Qwen3-32b (RAG_FINETUNED_CHUNK)
Model=Qwen3-32b, Condi...
2026.04
42.8
58.5
Sonnet 4.6 (NO_RAG)
Model=Sonnet 4.6, Cond...
2026.04
41.3
57.8
GPT-5.2 (NO_RAG)
Model=GPT-5.2, Conditi...
2026.04
31.5
50
Qwen3-32b (NO_RAG)
Model=Qwen3-32b, Condi...
2026.04
27.5
40.4
Feedback
Search any
task
Search any
task