Share your thoughts, 1 month free Claude Pro on usSee more

Multi-task Evaluation on Aggregate (HealthQA, ARC-C, PopQA, Squad1, ASQA)

62.8Average Score

AMATA 8B

Updated 2mo ago

Evaluation Results

Method	Links
AMATA 8B 2026.05		62.8
GiGPO 8B 2026.05		60.1
SPA-RL 8B 2026.05		59.76
SMART 8B 2026.05		59.29
SelfRag 8B 2026.05		51.81
GPT4o 2026.05		49.17
Llama-3-Ins.8B 2026.05		35.25
RADIT 8B 2026.05		35.2