Share your thoughts, 1 month free Claude Pro on usSee more

Compositional Generalization on Evaluation Dataset (Fold 2 Seen)

63.63Score

Gemma-7B + COGLM

Updated 4mo ago

Evaluation Results

Method	Links
Gemma-7B + COGLM 2026.01		63.63
LLaMA-3-8B + COGLM 2026.01		63.46
Mistral-7B + COGLM 2026.01		63.17
Mistral-7B 2026.01		62.48
Qwen2.5-7B + COGLM 2026.01		61.85
LLaMA-3-8B 2026.01		61.81
Qwen2.5-7B 2026.01		61.61
Gemma-7B 2026.01		57.37
DeepSeek-7B + COGLM 2026.01		52.04
DeepSeek-7B 2026.01		49.52
DeepSeek V3 2026.01		45.27
DeepSeek V3 + FS* 2026.01		44.18
Claude 3.5 Sonnet + FS* 2026.01		43.18
Claude 3.5 Sonnet 2026.01		41.11
GPT-4o 2026.01		40
GPT-4o + FS* 2026.01		39.12
Llama 3 (70B) + FS* 2026.01		33.49
Llama 3 (70B) 2026.01		23.66