Share your thoughts, 1 month free Claude Pro on usSee more

RAG Question Answering on SCORE domain-specific hazard response dataset (test)

61Specificity

GPT-4o

Updated 5mo ago

Evaluation Results

Method	Links
GPT-4o 2026.02		61	89	60	69
GPT-4o 2026.02		58	89	66	73
Gemini 2.5 2026.02		55	87	87	68
Gemini 2.5 2026.02		45	86	66	83