Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Natural Language Processing Reasoning on BBH-NLP (test)

66.2Accuracy (ACC)

KP

56.21658.80861.463.992Sep 28, 2025
Updated 1d ago

Evaluation Results

MethodLinks
2025.09
66.2-1000
2025.09
63.6-74.90.87
2025.09
63.5-1000
2025.09
63-740.89
2025.09
63-83.70.39
2025.09
62.8-73.20.89
2025.09
62.4-86.30.47
2025.09
61.2-69.90.92
61.15.169.80.92
2025.09
61.1-69.80.92
2025.09
61-69.30.92
2025.09
60.2-73.30.41
2025.09
60.2-75.30.46
59.63.973.90.41
2025.09
59.6-73.90.41
2025.09
56.6-69.60.51