Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Biological Reasoning on BioAlchemy
Loading...
52.78
ProtocolQA Accuracy
GPT-OSS-20B
32.8432
38.0191
43.195
48.3709
Apr 3, 2026
ProtocolQA Accuracy
SeqQA Accuracy
Cloning Scenarios Accuracy
PubMedQA Accuracy
GPQA-Bio Accuracy
Overall Average Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
ProtocolQA Accuracy
SeqQA Accuracy
Cloning Scenarios Accuracy
PubMedQA Accuracy
GPQA-Bio Accuracy
Overall Average Accuracy
GPT-OSS-20B
2026.04
52.78
18.12
7.27
72.79
55.79
41.35
BioAlchemist-8B
Training Data Size=150K
2026.04
46.2
22.32
15.15
68.32
62.11
42.82
BioAlchemist-8B
Training Data Size=50K
2026.04
45.83
18.73
11.52
68.09
57.89
40.41
Qwen3-8B
2026.04
42.69
8.42
5.76
69
42.63
33.7
DeepSeek-R1-Llama-8B
2026.04
33.61
4.97
10.91
26.89
5.26
16.33
Feedback
Search any
task
Search any
task