Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Report Generation on eRisk Reddit 2018
Loading...
4.9
Trajectory Coverage
GPT-traj
1.364
2.282
3.2
4.118
May 14, 2026
Trajectory Coverage
Temporal Coherence
Sensitivity to Change Points
Segment-Level Specificity
Overall Preference
Updated 19d ago
Evaluation Results
Method
Method
Links
Trajectory Coverage
Temporal Coherence
Sensitivity to Change Points
Segment-Level Specificity
Overall Preference
GPT-traj
Evaluator Model=GPT 5.2
2026.05
4.9
4.6
4.2
4
4.1
GPT-traj
Evaluator Model=Gemini...
2026.05
4.9
4.5
4.2
3.9
4
GPT-traj
Evaluator Model=Claude...
2026.05
4.8
4.8
4.2
4.8
4.8
GPT-traj
Evaluator Model=DeepSe...
2026.05
4.3
4.3
4
4.3
3.9
Baseline
Evaluator Model=Gemini...
2026.05
2.6
3
1.9
2.8
2.5
Baseline
Evaluator Model=DeepSe...
2026.05
2.6
3
2.4
2.3
2.5
Baseline
Evaluator Model=Claude...
2026.05
2.1
2.3
1.7
1.4
1.9
Baseline
Evaluator Model=GPT 5.2
2026.05
1.5
2.3
1.3
1.5
1.5
Feedback
Search any
task
Search any
task