Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Claim Verification on SciFact

40.5Accuracy

mistral-large

Updated 4mo ago

Evaluation Results

Method	Links
mistral-large 2026.02		40.5	19.2	22.1	50.6	54.9
deepseek-chat 2026.02		40	19.1	22.4	50.6	56.1
llama-3.3-70b 2026.02		40	19	22.5	50.6	55.6
flan-t5-large 2026.02		39.5	18.9	22.8	49.6	55.1
gpt-4o-mini 2026.02		39.5	18.9	22.3	50.6	56.1
gpt-5.2-chat 2026.02		39.5	18.9	22.6	50.6	55.9