Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Insight Generation on Internal non-scientific document collections Finance (LLM Judged)
Loading...
4.56
Set Score (Gemini-2.5-Flash)
INSIGHTGEN
1.5544
2.3347
3.115
3.8953
Apr 21, 2026
Set Score (Gemini-2.5-Flash)
Set Score (Claude-4-Sonnet)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Set Score (Gemini-2.5-Flash)
Set Score (Claude-4-Sonnet)
INSIGHTGEN
Base Model=Claude-3.5-...
2026.04
4.56
3.33
INSIGHTGEN
Base Model=GPT-4o
2026.04
4.44
3.11
GPT+CoT
Base Model=Claude-3.5-...
2026.04
4.33
3.11
FAISS+CoT
Base Model=GPT-4o
2026.04
3.67
3
FAISS+CoT
Base Model=Claude-3.5-...
2026.04
3.44
2.89
GPT+CoT
Base Model=GPT-4o
2026.04
3.33
2.67
FAISS
Base Model=GPT-4o
2026.04
3.22
2.11
Direct GPT
Base Model=GPT-4o
2026.04
3.11
2.11
Direct GPT
Base Model=Claude-3.5-...
2026.04
2.44
1.78
FAISS
Base Model=Claude-3.5-...
2026.04
1.67
1.67
Feedback
Search any
task
Search any
task