Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Science Question Answering on ARC (Reusability Score Focus)
Loading...
49.45
Reusability Score
Phi
30.834
35.667
40.5
45.333
Feb 19, 2026
Reusability Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Reusability Score
Phi
Executor=Strong Comm.
2026.02
49.45
Llama
Executor=Strong Comm.
2026.02
44.64
Phi
Executor=Full Comm.
2026.02
43.4
Llama
Executor=Full Comm.
2026.02
42.72
Llama
Executor=Weak Comm.
2026.02
40.79
Phi
Executor=Weak Comm.
2026.02
37.34
Gemma
Executor=Strong Comm.
2026.02
36.95
Gemma
Executor=Full Comm.
2026.02
35.88
Gemma
Executor=Weak Comm.
2026.02
34.81
R1
Executor=Strong Comm.
2026.02
33.04
R1
Executor=Full Comm.
2026.02
32.3
R1
Executor=Weak Comm.
2026.02
31.55
Feedback
Search any
task
Search any
task