Share your thoughts, 1 month free Claude Pro on usSee more

Compositional Generalization on Evaluation Dataset Unseen (Fold 2)

50Score

Mistral-7B

Updated 4mo ago

Evaluation Results

Method	Links
Mistral-7B 2026.01		50
Qwen2.5-7B 2026.01		46.36
LLaMA-3-8B + COGLM 2026.01		45.66
DeepSeek V3 + FS* 2026.01		44.51
LLaMA-3-8B 2026.01		44.22
Gemma-7B + COGLM 2026.01		44
Qwen2.5-7B + COGLM 2026.01		43.87
Mistral-7B + COGLM 2026.01		43.61
GPT-4o + FS* 2026.01		41.1
Claude 3.5 Sonnet + FS* 2026.01		40.35
DeepSeek V3 2026.01		40.23
Gemma-7B 2026.01		39.85
DeepSeek-7B + COGLM 2026.01		38.69
Claude 3.5 Sonnet 2026.01		36.88
DeepSeek-7B 2026.01		34.17
Llama 3 (70B) + FS* 2026.01		33.3
GPT-4o 2026.01		32.51
Llama 3 (70B) 2026.01		18.33