Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Performance Evaluation on Table 1 Aggregate excluding Human-Internal

79.74Average Score

LMUNIT LLaMA3.1-70B-Decomposed-Weighted

52.814459.804766.79573.7853Dec 17, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.12
79.74
2024.12
79.29
2024.12
78.78
2024.12
78.26
2024.12
77.59
76.43
2024.12
74.1
2024.12
72.27
2024.12
68.52
59.74
2024.12
57.98
2024.12
53.85