Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Internal non-scientific document collections

Benchmarks

Task NameDataset NameSOTA ResultTrend
Insight GenerationInternal non-scientific document collections Twitter & Mental Health
Set-level Score (Gemini-2.5-Flash)4.5
10
Insight GenerationInternal non-scientific document collections
Set-level Score (Gemini-2.5-Flash)4.61
10
Insight GenerationInternal non-scientific document collections Revenue & Finance Reports
Set-level Score (Gemini-2.5-Flash)4.65
10
Insight GenerationInternal non-scientific document collections (Responsible AI Consulting)
Set-level Score (Gemini-2.5-Flash Judge)4.5
10
Insight GenerationInternal non-scientific document collections Hotel Sales Strategies
Set-level Score (Gemini-2.5-Flash)4.53
10
Insight GenerationInternal non-scientific document collections Finance - Investment 3
Set-level Score (Gemini-2.5-Flash)4.73
10
Insight GenerationInternal non-scientific document collections (Legal & Regulatory Compliance)
Set-level Score (Gemini-2.5-Flash)4.35
10
Insight GenerationInternal non-scientific document collections Finance - Investment 2
Set-level Score (Gemini 2.5 Flash)4.73
10
Insight GenerationInternal non-scientific document collections Finance
Set Score (Gemini-2.5-Flash)4.56
10
Insight GenerationInternal non-scientific document collections Gut Health Insights
Set-level Score (Gemini-2.5-Flash)4.15
10
Insight GenerationInternal non-scientific document collections Climate Change Policy
Set-level Score (Gemini-2.5-Flash Judge)4.77
10
Insight GenerationInternal non-scientific document collections Instagram Marketing
Set-level Score (Gemini-2.5-Flash Judge)4.76
10
Insight GenerationInternal non-scientific document collections Legal Business Analysis
Set-level Score (Gemini-2.5-Flash Judge)4.65
10
Showing 13 of 13 rows