Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction-following clustering on ECHR (C1)
Loading...
85.79
V-measure
o3
42.3596
53.6348
64.91
76.1852
Mar 6, 2026
V-measure
Updated 23d ago
Evaluation Results
Method
Method
Links
V-measure
o3
Model Type=Reasoning M...
2026.03
85.79
C1-Qwen-14B
Model Type=Our Model
2026.03
84.8
Gemini 2.5 Pro
Model Type=Reasoning M...
2026.03
84.14
C1-Qwen-7B
Model Type=Our Model
2026.03
82.76
QwQ-32B
Model Type=Reasoning M...
2026.03
72.63
GPT-oss-120B
Model Type=Reasoning M...
2026.03
69.18
Distill-Llama-70B
Model Type=Reasoning M...
2026.03
66.4
GPT-4.1
Model Type=General Model
2026.03
59.87
GPT-4o
Model Type=General Model
2026.03
56.15
DeepSeek-R1
Model Type=Reasoning M...
2026.03
54.55
Distill-Qwen-32B
Model Type=Reasoning M...
2026.03
53.15
Llama-3.1-70B-Instruct
Model Type=General Model
2026.03
44.58
Qwen2.5-72B-Instruct
Model Type=General Model
2026.03
44.03
Feedback
Search any
task
Search any
task