Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-Ended Question Answering (with Context) on Earth Observation
Loading...
86.65
Judge Score
GPT-4.1
77.9452
80.2051
82.465
84.7249
Mar 20, 2026
Judge Score
EVE WR
Rank
Updated 3d ago
Evaluation Results
Method
Method
Links
Judge Score
EVE WR
Rank
GPT-4.1
Size (B)=1800*
2026.03
86.65
49.22
2.83
Mistral Medium 3.1
Size (B)=200*
2026.03
86.44
50.99
4.17
Qwen3
Size (B)=235-A22
2026.03
86.1
50.09
2.17
GPT OSS
Size (B)=120A5
2026.03
84.8
50.7
4.83
GPT-5 nano
Size (B)=20*
2026.03
84.4
48.6
5.33
MiniMax m2.5
Size (B)=230A10
2026.03
81.57
51.2
5.17
EVE-Instruct
Size (B)=24
2026.03
78.28
-
3.5
Feedback
Search any
task
Search any
task