Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Paper Understanding on ELAIPBench
Loading...
43.7
Score
AgentSPEX
33.404
36.077
38.75
41.423
Apr 14, 2026
Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Score
AgentSPEX
Model=GPT-5*, Domain=P...
2026.04
43.7
CoT
Model=GPT-5*, Domain=P...
2026.04
37.22
ReAct
Model=GPT-5*, Domain=P...
2026.04
33.8
Feedback
Search any
task
Search any
task