Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Extreme Weather Detection on Location answer-based questions
Loading...
3.85
F1 Score
ZEPHYRUS-REFLECTIVE
-0.154
0.8855
1.925
2.9645
Oct 5, 2025
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
ZEPHYRUS-REFLECTIVE
LLM=gemini-2.5-flash
2025.10
3.85
ZEPHYRUS-DIRECT
LLM=gpt-5-mini
2025.10
1.83
ZEPHYRUS-DIRECT
LLM=gemini-2.5-flash
2025.10
1.79
ZEPHYRUS-REFLECTIVE
LLM=gpt-5.2
2025.10
0
ZEPHYRUS-DIRECT
LLM=gpt-5.2
2025.10
0
Text Only LLM
LLM=gpt-5.2
2025.10
0
ZEPHYRUS-REFLECTIVE
LLM=gpt-5-mini
2025.10
0
Text Only LLM
LLM=gpt-5-mini
2025.10
0
Text Only LLM
LLM=gemini-2.5-flash
2025.10
0
ZEPHYRUS-REFLECTIVE
LLM=gpt-oss-120b
2025.10
0
ZEPHYRUS-DIRECT
LLM=gpt-oss-120b
2025.10
0
Text Only LLM
LLM=gpt-oss-120b
2025.10
0
ZEPHYRUS-REFLECTIVE
LLM=qwen3-30b
2025.10
0
ZEPHYRUS-DIRECT
LLM=qwen3-30b
2025.10
0
Text Only LLM
LLM=qwen3-30b
2025.10
0
Feedback
Search any
task
Search any
task