Share your thoughts, 1 month free Claude Pro on usSee more

Tool-augmented reasoning on MINT-Bench

9.85Success Rate (Turn 1)

LLAMA PRO - INSTRUCT

Updated 4mo ago

Evaluation Results

Method	Links
LLAMA PRO - INSTRUCT 2024.01		9.85	12.65	12.8	11.95	14.68	12.38
Mistral-Instruct-v0.1 2024.01		1.54	12.12	13.31	14.16	13.99	11.02
LLaMA2-7B-Chat 2024.01		1.02	4.27	9.77	6.48	7.34	5.77
CodeLLaMA-7B-Instruct 2024.01		0.34	7.85	10.24	9.73	8.7	7.37
AgentLM-7B 2024.01		0	4.44	5.29	6.48	7.34	4.71