Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Insight Generation on Internal non-scientific document collections (Gemini/Claude Judged)
Loading...
4.61
Set-level Score (Gemini-2.5-Flash)
INSIGHTGEN
1.7604
2.5002
3.24
3.9798
Apr 21, 2026
Set-level Score (Gemini-2.5-Flash)
Set-level Score (Claude-4-Sonnet)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Set-level Score (Gemini-2.5-Flash)
Set-level Score (Claude-4-Sonnet)
INSIGHTGEN
Base Model=GPT-4o
2026.04
4.61
3.61
INSIGHTGEN
Base Model=Claude-3.5-...
2026.04
4.39
3.39
FAISS+CoT
Base Model=Claude-3.5-...
2026.04
4.13
2.22
FAISS
Base Model=GPT-4o
2026.04
3.78
2.7
GPT+CoT
Base Model=Claude-3.5-...
2026.04
3.48
1.87
FAISS+CoT
Base Model=GPT-4o
2026.04
3.26
2.35
GPT+CoT
Base Model=GPT-4o
2026.04
3.09
2.26
Direct GPT
Base Model=Claude-3.5-...
2026.04
2.96
1.91
Direct GPT
Base Model=GPT-4o
2026.04
2.78
2.09
FAISS
Base Model=Claude-3.5-...
2026.04
1.87
2.04
Feedback
Search any
task
Search any
task