Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Insight Generation on Internal non-scientific document collections (Responsible AI Consulting)
Loading...
4.5
Set-level Score (Gemini-2.5-Flash Judge)
INSIGHTGEN
2.2224
2.8137
3.405
3.9963
Apr 21, 2026
Set-level Score (Gemini-2.5-Flash Judge)
Set-level Score (Claude-4-Sonnet Judge)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Set-level Score (Gemini-2.5-Flash Judge)
Set-level Score (Claude-4-Sonnet Judge)
INSIGHTGEN
Base Model=GPT-4o
2026.04
4.5
3.31
INSIGHTGEN
Base Model=Claude-3.5-...
2026.04
4.31
3.5
FAISS
Base Model=GPT-4o
2026.04
3.94
3.13
FAISS+CoT
Base Model=Claude-3.5-...
2026.04
3.88
2.5
Direct GPT
Base Model=GPT-4o
2026.04
3.81
2.56
GPT+CoT
Base Model=Claude-3.5-...
2026.04
3.25
2.13
GPT+CoT
Base Model=GPT-4o
2026.04
3
2.5
FAISS+CoT
Base Model=GPT-4o
2026.04
2.69
2.19
Direct GPT
Base Model=Claude-3.5-...
2026.04
2.63
1.44
FAISS
Base Model=Claude-3.5-...
2026.04
2.31
1.94
Feedback
Search any
task
Search any
task