Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math & Code Reasoning on SciQ
Loading...
66.5
Score
DiReCT
42.164
48.482
54.8
61.118
May 29, 2026
Score
Updated 2d ago
Evaluation Results
Method
Method
Links
Score
DiReCT
Backbone=Llama-1.1B
2026.05
66.5
InfoBatch
Backbone=Llama-1.1B
2026.05
65.6
Perplexity-based
Backbone=Llama-1.1B
2026.05
63.1
GradNorm (IS)
Backbone=Llama-1.1B
2026.05
62.8
Uniform Sampling
Backbone=Llama-1.1B
2026.05
61.2
Loss-based
Backbone=Llama-1.1B
2026.05
60.5
DiReCT
Backbone=GPT-2-Medium...
2026.05
48.5
InfoBatch
Backbone=GPT-2-Medium...
2026.05
46
GradNorm (IS)
Backbone=GPT-2-Medium...
2026.05
45.2
Perplexity-based
Backbone=GPT-2-Medium...
2026.05
44.6
Loss-based
Backbone=GPT-2-Medium...
2026.05
43.7
Uniform Sampling
Backbone=GPT-2-Medium...
2026.05
43.1
Feedback
Search any
task
Search any
task