Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pair-wise Research Idea Comparison on D_pair Hard
Loading...
63
Acchard
InnoEval
38.04
44.52
51
57.48
Feb 16, 2026
Acchard
Updated 3mo ago
Evaluation Results
Method
Method
Links
Acchard
InnoEval
Backbone=DeepSeek-V3.2
2026.02
63
ScholarEval
Backbone=DeepSeek-V3.2
2026.02
60
InternAgent
Backbone=DeepSeek-V3.2
2026.02
59.5
GraphEval
Backbone=DeepSeek-V3.2
2026.02
44.5
ResearchAgent
Backbone=DeepSeek-V3.2
2026.02
43
CoT
Backbone=DeepSeek-V3.2
2026.02
40.5
RAG
Backbone=DeepSeek-V3.2
2026.02
39
Feedback
Search any
task
Search any
task