Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Behavioral Prediction on Woodward-like Behavioral Prediction Non-animate
Loading...
94
Accuracy
Claude Opus 4.6
45.12
57.81
70.5
83.19
Mar 5, 2025
Accuracy
Updated 2mo ago
Evaluation Results
Method
Method
Links
Accuracy
Claude Opus 4.6
2025.03
94
Gemini 3.1 Pro
2025.03
90
Qwen 3.5 Plus
2025.03
88
Humans
2025.03
81
GPT-4o
2025.03
60
GPT-5.2
2025.03
58
Claude 3.5 Sonnet
2025.03
50
Qwen VL Max
2025.03
49
Gemini 3.1 Flash
2025.03
47
Feedback
Search any
task
Search any
task