Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning and Knowledge on HLE
Loading...
41.1
Score
EvoMaster
12.5
19.925
27.35
34.775
Apr 19, 2026
Score
Relative Improvement
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Relative Improvement
EvoMaster
Backend Model=GPT-5.4,...
2026.04
41.1
202
OpenClaw
Backend Model=GPT-5.4,...
2026.04
13.6
-
Feedback
Search any
task
Search any
task