Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction-following clustering on REASONCLUSTER (Overall)
Loading...
68.42
V-measure (%)
C1-Qwen-14B
27.4856
38.1128
48.74
59.3672
Mar 6, 2026
V-measure (%)
Updated 23d ago
Evaluation Results
Method
Method
Links
V-measure (%)
C1-Qwen-14B
Model Type=Our Model
2026.03
68.42
C1-Qwen-7B
Model Type=Our Model
2026.03
66.54
o3
Model Type=Reasoning M...
2026.03
65.08
Gemini 2.5 Pro
Model Type=Reasoning M...
2026.03
61.82
QwQ-32B
Model Type=Reasoning M...
2026.03
54.78
GPT-oss-120B
Model Type=Reasoning M...
2026.03
52.31
Distill-Llama-70B
Model Type=Reasoning M...
2026.03
45.12
DeepSeek-R1
Model Type=Reasoning M...
2026.03
44.06
GPT-4.1
Model Type=General Model
2026.03
43.51
GPT-4o
Model Type=General Model
2026.03
41.26
Distill-Qwen-32B
Model Type=Reasoning M...
2026.03
34.96
Llama-3.1-70B-Instruct
Model Type=General Model
2026.03
31.55
Qwen2.5-72B-Instruct
Model Type=General Model
2026.03
29.06
Feedback
Search any
task
Search any
task