| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Insight Generation | Internal non-scientific document collections Twitter & Mental Health | Set-level Score (Gemini-2.5-Flash)4.5 | 10 | |
| Insight Generation | Internal non-scientific document collections | Set-level Score (Gemini-2.5-Flash)4.61 | 10 | |
| Insight Generation | Internal non-scientific document collections Revenue & Finance Reports | Set-level Score (Gemini-2.5-Flash)4.65 | 10 | |
| Insight Generation | Internal non-scientific document collections (Responsible AI Consulting) | Set-level Score (Gemini-2.5-Flash Judge)4.5 | 10 | |
| Insight Generation | Internal non-scientific document collections Hotel Sales Strategies | Set-level Score (Gemini-2.5-Flash)4.53 | 10 | |
| Insight Generation | Internal non-scientific document collections Finance - Investment 3 | Set-level Score (Gemini-2.5-Flash)4.73 | 10 | |
| Insight Generation | Internal non-scientific document collections (Legal & Regulatory Compliance) | Set-level Score (Gemini-2.5-Flash)4.35 | 10 | |
| Insight Generation | Internal non-scientific document collections Finance - Investment 2 | Set-level Score (Gemini 2.5 Flash)4.73 | 10 | |
| Insight Generation | Internal non-scientific document collections Finance | Set Score (Gemini-2.5-Flash)4.56 | 10 | |
| Insight Generation | Internal non-scientific document collections Gut Health Insights | Set-level Score (Gemini-2.5-Flash)4.15 | 10 | |
| Insight Generation | Internal non-scientific document collections Climate Change Policy | Set-level Score (Gemini-2.5-Flash Judge)4.77 | 10 | |
| Insight Generation | Internal non-scientific document collections Instagram Marketing | Set-level Score (Gemini-2.5-Flash Judge)4.76 | 10 | |
| Insight Generation | Internal non-scientific document collections Legal Business Analysis | Set-level Score (Gemini-2.5-Flash Judge)4.65 | 10 |