Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hallucination Detection on Earth Observation
Loading...
90.94
F1 Score
GPT-5 nano
76.328
80.1215
83.915
87.7085
Mar 20, 2026
F1 Score
Rank
Updated 3d ago
Evaluation Results
Method
Method
Links
F1 Score
Rank
GPT-5 nano
Size (B)=20*
2026.03
90.94
5.33
GPT OSS
Size (B)=120A5
2026.03
89.92
4.83
EVE-Instruct
Size (B)=24
2026.03
84.7
3.5
Qwen3
Size (B)=235-A22
2026.03
84.4
2.17
MiniMax m2.5
Size (B)=230A10
2026.03
83.77
5.17
GPT-4.1
Size (B)=1800*
2026.03
81.58
2.83
Mistral Medium 3.1
Size (B)=200*
2026.03
76.89
4.17
Feedback
Search any
task
Search any
task