Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge-intensive Question Answering on OpenScience
Loading...
53.8
Pass@1
P-POTS+Mirror
45.0016
47.2858
49.57
51.8542
Nov 22, 2025
Pass@1
Updated 12d ago
Evaluation Results
Method
Method
Links
Pass@1
P-POTS+Mirror
Backbone=LLaDA-8B-Inst...
2025.11
53.8
P-POTS+Mirror
Backbone=LLaDA-8B-Inst...
2025.11
53
P-POTS+Mirror
Backbone=LLaDA-8B-Inst...
2025.11
52.53
Qwen3-8B-Base
Seed=Avg
2025.11
51.62
P-POTS+Mirror
Backbone=LLaDA-8B-Inst...
2025.11
50.8
Clipped
Backbone=LLaDA-8B-Inst...
2025.11
49.33
P-POTS
Backbone=LLaDA-8B-Inst...
2025.11
47.6
P-POTS
Backbone=LLaDA-8B-Inst...
2025.11
47.47
EMA
Backbone=LLaDA-8B-Inst...
2025.11
47.08
P-POTS
Backbone=LLaDA-8B-Inst...
2025.11
46.8
Llama3.1-8B-Instruct
Seed=Avg
2025.11
46.59
Qwen2.5-7B-Instruct
Seed=Avg, Type=Auto-Re...
2025.11
46.42
MIRROR
Backbone=LLaDA-8B-Inst...
2025.11
46.38
Standard
Backbone=LLaDA-8B-Inst...
2025.11
45.52
ISAD
Backbone=LLaDA-8B-Inst...
2025.11
45.4
P-POTS
Backbone=LLaDA-8B-Inst...
2025.11
45.34
Feedback
Search any
task
Search any
task