Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IoDResearch: Deep Research on Private Heterogeneous Data via the Internet of Data

About

The rapid growth of multi-source, heterogeneous, and multimodal scientific data has increasingly exposed the limitations of traditional data management. Most existing DeepResearch (DR) efforts focus primarily on web search while overlooking local private data. Consequently, these frameworks exhibit low retrieval efficiency for private data and fail to comply with the FAIR principles, ultimately resulting in inefficiency and limited reusability. To this end, we propose IoDResearch (Internet of Data Research), a private data-centric Deep Research framework that operationalizes the Internet of Data paradigm. IoDResearch encapsulates heterogeneous resources as FAIR-compliant digital objects, and further refines them into atomic knowledge units and knowledge graphs, forming a heterogeneous graph index for multi-granularity retrieval. On top of this representation, a multi-agent system supports both reliable question answering and structured scientific report generation. Furthermore, we establish the IoD DeepResearch Benchmark to systematically evaluate both data representation and Deep Research capabilities in IoD scenarios. Experimental results on retrieval, QA, and report-writing tasks show that IoDResearch consistently surpasses representative RAG and Deep Research baselines. Overall, IoDResearch demonstrates the feasibility of private-data-centric Deep Research under the IoD paradigm, paving the way toward more trustworthy, reusable, and automated scientific discovery.

Zhuofan Shi, Zijie Guo, Xinjian Ma, Gang Huang, Yun Ma, Xiang Jing• 2025

Related benchmarks

TaskDatasetResultRank
Report writingTask 3 Single-domain
LLM-as-Judge Score8.31
5
Report writingTask 3 Cross-domain
LLM-as-Judge Score8.23
5
Digital Object RetrievalIoD Deep Research Benchmark
Precision76.26
4
Question AnsweringTask 2 Single-domain
Answer Accuracy79.98
4
Question AnsweringTask 2 Cross-domain
Answer Accuracy59.4
4
Showing 5 of 5 rows

Other info

Follow for update