Share your thoughts, 1 month free Claude Pro on usSee more

Open Question Answering on LegalMC4 (test)

77.2LLM Factual Correctness

GPT-5 (min. reasoning)

Updated 5mo ago

Evaluation Results

Method	Links
GPT-5 (min. reasoning) 2026.01		77.2
GPT-5-mini (min. reasoning) 2026.01		70.1
LLaMA 3.1 (8B) 2026.01		55.4
Gemma 3 (12B) 2026.01		54.5
LLaMA 3.1 (8B) 2026.01		43
Gemma 3 (12B) 2026.01		41.8
Gemma 3 (12B) 2026.01		38.7
LLaMA 3.1 (8B) 2026.01		35.4