Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-Context Reasoning on GPQA Diamond (out-of-distribution)
Loading...
48.5
Accuracy
VTC-R1
25.412
31.406
37.4
43.394
Jan 29, 2026
Accuracy
Token Throughput
Latency
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Token Throughput
Latency
VTC-R1
Architecture=Qwen3-VL-8B
2026.01
48.5
9.77
9.57
VTC-R1
Architecture=Glyph
2026.01
46
10.73
6.96
SFT
Architecture=Glyph
2026.01
38.4
13.91
8.35
SFT
Architecture=Qwen3-VL-8B
2026.01
37.4
14.78
26.88
TokenSkip
Architecture=Glyph
2026.01
35.9
15.45
9.93
Base SFT
Architecture=Glyph
2026.01
26.3
19.74
14.43
Feedback
Search any
task
Search any
task