Ai2 Scholar QA: Organized Literature Synthesis with Attribution

About

Retrieval-augmented generation is increasingly effective in answering scientific questions from literature, but many state-of-the-art systems are expensive and closed-source. We introduce Ai2 Scholar QA, a free online scientific question answering application. To facilitate research, we make our entire pipeline public: as a customizable open-source Python package and interactive web app, along with paper indexes accessible through public APIs and downloadable datasets. We describe our system in detail and present experiments analyzing its key design decisions. In an evaluation on a recent scientific QA benchmark, we find that Ai2 Scholar QA outperforms competing systems.

Amanpreet Singh, Joseph Chee Chang, Chloe Anastasiades, Dany Haddad, Aakanksha Naik, Amber Tanaka, Angele Zamarron, Cecile Nguyen, Jena D. Hwang, Jason Dunkleberger, Matt Latzke, Smita Rao, Jaron Lochner, Rob Evans, Rodney Kinney, Daniel S. Weld, Doug Downey, Sergey Feldman• 2025

Related benchmarks

Task	Dataset	Result
Deep Research	ResearchQA	Score75	42
Long-form research	DRB	Score36.1	39
Deep Research	HealthBench	Score32	38
Science Question Answering	ResearchQA	Accuracy (ResearchQA)75	37
Deep Research Report Generation	DRB	Overall Score36.1	24
Search-based Question Answering	SQA v2	Overall Score87.7	21
Aggregate Deep Research Performance	SQA, ResearchQA, and DRB v2	Average Score66.3	21
Deep Research	HealthBench ResearchQA DRB Macro Average	Average Score47.7	21
Deep Research	DeepResearchBench (DRB)	Overall Score36.1	21
Deep Research	SQA v2	Score87.7	18

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord