Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction-following clustering on LMSY (C2)
Loading...
64.17
V-measure
o3
26.3244
36.1497
45.975
55.8003
Mar 6, 2026
V-measure
Updated 23d ago
Evaluation Results
Method
Method
Links
V-measure
o3
Model Type=Reasoning M...
2026.03
64.17
C1-Qwen-14B
Model Type=Our Model
2026.03
62.65
C1-Qwen-7B
Model Type=Our Model
2026.03
58.51
Gemini 2.5 Pro
Model Type=Reasoning M...
2026.03
57.95
QwQ-32B
Model Type=Reasoning M...
2026.03
53.74
GPT-oss-120B
Model Type=Reasoning M...
2026.03
52.77
Distill-Llama-70B
Model Type=Reasoning M...
2026.03
47.76
DeepSeek-R1
Model Type=Reasoning M...
2026.03
45.8
GPT-4.1
Model Type=General Model
2026.03
43.42
Distill-Qwen-32B
Model Type=Reasoning M...
2026.03
39.45
GPT-4o
Model Type=General Model
2026.03
33.38
Llama-3.1-70B-Instruct
Model Type=General Model
2026.03
28.97
Qwen2.5-72B-Instruct
Model Type=General Model
2026.03
27.78
Feedback
Search any
task
Search any
task