Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Task addition on Task Arithmetic Benchmark (test)

88.5Avg Absolute Accuracy

Linear. FT

46.79657.62368.4579.277May 22, 2023
Updated 3mo ago

Evaluation Results

MethodLinks
2023.05
88.593.5
2023.05
85.188.8
2023.05
81.386
2023.05
76.585.4
2023.05
75.580
2023.05
75.290
2023.05
71.476.5
2023.05
6585.2
2023.05
64.4-
2023.05
57.181.9
2023.05
55.2-
2023.05
48.4-