Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback

About

The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI). Recent works demonstrate that various AI-assisted research methods can largely improve research efficiency by improving data analysis, accelerating computation, and fostering novel idea generation. To further move towards the ultimate goal (i.e., automatic scientific research), in this paper, we introduce Dolphin, a closed-loop LLM-driven framework to enhance the automation level of scientific research. Dolphin first generates novel ideas based on feedback from previous experiments and relevant papers ranked by the topic and task attributes. Then, the generated ideas can be implemented using a code template refined and debugged with the designed exception-traceback-guided local code structure. Finally, Dolphin automatically analyzes the results of each idea and feeds the results back to the next round of idea generation. Experiments are conducted on the benchmark datasets of different topics and a subset of MLE-bench. Results show that Dolphin can continuously improve the performance of the input topic in a loop. We highlight that Dolphin can automatically propose methods that are comparable to the state-of-the-art in some tasks such as 3D point classification.

Jiakang Yuan, Xiangchao Yan, Shiyang Feng, Bo Zhang, Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, Bowen Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Time Series ForecastingETTh1--
601
3D Point Cloud ClassificationModelNet40 (test)
OA93.9
297
Sentiment ClassificationSST2 (test)
Accuracy92.5
214
2D image classificationCIFAR-100 (test)
Top-1 Accuracy82
5
Enhancer Activity PredictionAutoEAP
HK-PCC0.76
4
Molecular DynamicsAutoMD
Energy MAE0.152
4
Power Flow EstimationIEEE 39-Bus
RMSE0.0046
4
Reactant Yield PredictionAutoRYP
R-squared31.8
4
Transcription PredictionAutoTPPR
MSE0.173
4
Detecting insults in social commentaryDetecting insults in social commentary MLE-bench
Score84.7
1
Showing 10 of 12 rows

Other info

Code

Follow for update