Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning & QA on All Evaluated Datasets

39.7Average Accuracy

DVPO

34.437635.803837.1738.5362Dec 3, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
39.7
2025.12
36.75
2025.12
36.65
2025.12
35.98
2025.12
35.28
2025.12
34.72
2025.12
34.64