Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science Question Answering on ARC (Reusability Score Focus)
Loading...
49.45
Reusability Score
Phi
30.834
35.667
40.5
45.333
Feb 19, 2026
Reusability Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reusability Score
Phi
Executor=Strong Comm.
2026.02
49.45
Llama
Executor=Strong Comm.
2026.02
44.64
Phi
Executor=Full Comm.
2026.02
43.4
Llama
Executor=Full Comm.
2026.02
42.72
Llama
Executor=Weak Comm.
2026.02
40.79
Phi
Executor=Weak Comm.
2026.02
37.34
Gemma
Executor=Strong Comm.
2026.02
36.95
Gemma
Executor=Full Comm.
2026.02
35.88
Gemma
Executor=Weak Comm.
2026.02
34.81
R1
Executor=Strong Comm.
2026.02
33.04
R1
Executor=Full Comm.
2026.02
32.3
R1
Executor=Weak Comm.
2026.02
31.55
Feedback
Search any
task
Search any
task