Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Ternary Research Idea Classification on D_point
Loading...
73.73
Accuracy (3 Class)
InnoEval
32.9932
43.5691
54.145
64.7209
Feb 16, 2026
Accuracy (3 Class)
F1 Score (3 Class)
Updated 3mo ago
Evaluation Results
Method
Method
Links
Accuracy (3 Class)
F1 Score (3 Class)
InnoEval
Backbone=DeepSeek-V3.2
2026.02
73.73
74.56
ScholarEval
Backbone=DeepSeek-V3.2
2026.02
61.75
58.38
InternAgent
Backbone=DeepSeek-V3.2
2026.02
56.68
43.05
ResearchAgent
Backbone=DeepSeek-V3.2
2026.02
54.84
39.81
GraphEval
Backbone=DeepSeek-V3.2
2026.02
53.46
33.03
RAG
Backbone=DeepSeek-V3.2
2026.02
35.48
28.45
CoT
Backbone=DeepSeek-V3.2
2026.02
34.56
27.86
Feedback
Search any
task
Search any
task