Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Dialogue Annotation on Eedi (test)
Loading...
0.743
Cohen's Kappa
Sonnet 4.6 (RAG_FINETUNED_UTT)
0.13668
0.29409
0.4515
0.60891
Apr 3, 2026
Cohen's Kappa
Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Cohen's Kappa
Accuracy
Sonnet 4.6 (RAG_FINETUNED_UTT)
Model=Sonnet 4.6, Cond...
2026.04
0.743
84.7
Qwen3-32b (RAG_FINETUNED_UTT)
Model=Qwen3-32b, Condi...
2026.04
0.694
81.8
GPT-5.2 (RAG_FINETUNED_UTT)
Model=GPT-5.2, Conditi...
2026.04
0.659
81
Sonnet 4.6 (RAG_NO_FINETUNE)
Model=Sonnet 4.6, Cond...
2026.04
0.632
79.7
Sonnet 4.6 (RAG_FINETUNED_CHUNK)
Model=Sonnet 4.6, Cond...
2026.04
0.632
78
Qwen3-32b (RAG_NO_FINETUNE)
Model=Qwen3-32b, Condi...
2026.04
0.59
76.9
Qwen3-32b (RAG_FINETUNED_CHUNK)
Model=Qwen3-32b, Condi...
2026.04
0.545
72.5
GPT-5.2 (RAG_FINETUNED_CHUNK)
Model=GPT-5.2, Conditi...
2026.04
0.53
74
GPT-5.2 (RAG_NO_FINETUNE)
Model=GPT-5.2, Conditi...
2026.04
0.505
70.9
Sonnet 4.6 (NO_RAG)
Model=Sonnet 4.6, Cond...
2026.04
0.41
71.4
GPT-5.2 (NO_RAG)
Model=GPT-5.2, Conditi...
2026.04
0.351
57.5
Qwen3-32b (NO_RAG)
Model=Qwen3-32b, Condi...
2026.04
0.16
35.4
Feedback
Search any
task
Search any
task