Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on FLenQA 2000 tokens

52.5Accuracy

LIME+1

39.81243.10646.449.694Dec 8, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
52.5
2025.12
52.5
2025.12
52.5
2025.12
44.3
2025.12
44.3
2025.12
44.3
2025.12
40.3
2025.12
40.3
2025.12
40.3