Share your thoughts, 1 month free Claude Pro on usSee more

Compositional Generalization on Evaluation Dataset (Fold 3 Seen)

66.69Score

LLaMA-3-8B + COGLM

Updated 4mo ago

Evaluation Results

Method	Links
LLaMA-3-8B + COGLM 2026.01		66.69
Mistral-7B + COGLM 2026.01		65.1
Mistral-7B 2026.01		62.64
Qwen2.5-7B + COGLM 2026.01		61.73
Gemma-7B + COGLM 2026.01		61.7
Qwen2.5-7B 2026.01		61.09
Gemma-7B 2026.01		60.43
LLaMA-3-8B 2026.01		59.72
DeepSeek V3 + FS* 2026.01		46.62
DeepSeek-7B + COGLM 2026.01		45.47
Claude 3.5 Sonnet + FS* 2026.01		45.04
DeepSeek V3 2026.01		42.46
Claude 3.5 Sonnet 2026.01		41.59
GPT-4o + FS* 2026.01		40.05
DeepSeek-7B 2026.01		39.02
GPT-4o 2026.01		35.58
Llama 3 (70B) + FS* 2026.01		33.15
Llama 3 (70B) 2026.01		22.31