Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Vignette Completeness on CIRL-Vig
Loading...
45.8
Vignette Completeness
OpenThinker3-7B
-1.832
10.534
22.9
35.266
Apr 21, 2026
Vignette Completeness
Updated 1mo ago
Evaluation Results
Method
Method
Links
Vignette Completeness
OpenThinker3-7B
Condition=Zero-shot
2026.04
45.8
Gemma-3-12B
Condition=SFT
2026.04
45.4
Gemma-3-12B
Condition=Zero-shot
2026.04
43.6
Phi-4
Condition=Zero-shot
2026.04
43
Qwen3.5-9B
Condition=SFT
2026.04
41.6
Qwen3.5-2B
Condition=Zero-shot
2026.04
41.6
Qwen3.5-4B
Condition=Zero-shot
2026.04
41.6
Qwen3.5-2B
Condition=SFT
2026.04
40.8
Qwen3.5-9B
Condition=Zero-shot
2026.04
38.7
OpenThinker3-7B
Condition=SFT
2026.04
34.1
Phi-4
Condition=SFT
2026.04
27.8
ContextReasoner-7B
Condition=Zero-shot
2026.04
13
Qwen3.5-9B
Condition=GRPO
2026.04
0
GPT-OSS-20B
Condition=Zero-shot
2026.04
0
GPT-OSS-20B
Condition=SFT
2026.04
0
Feedback
Search any
task
Search any
task