Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning & QA on Science & QA Domain Multiple Datasets

4.04Average Accuracy

DVPO

2.91683.20843.53.7916Dec 3, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
4.04
2025.12
3.72
2025.12
3.55
2025.12
3.3
2025.12
3.22
2025.12
3.16
2025.12
2.96