Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

About

Missing data imputation is a critical challenge in various domains, such as healthcare and finance, where data completeness is vital for accurate analysis. Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation, making them a promising tool for data imputation. However, challenges persist in designing effective prompts for a finetuning-free process and in mitigating biases and uncertainty in LLM outputs. To address these issues, we propose a novel framework, LLM-Forest, which introduces a "forest" of few-shot prompt learning LLM "trees" with their outputs aggregated via confidence-based weighted voting based on LLM self-assessment, inspired by the ensemble learning (Random Forest). This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity. Extensive experiments on 9 real-world datasets demonstrate the effectiveness and efficiency of LLM-Forest.

Xinrui He, Yikun Ban, Jiaru Zou, Tianxin Wei, Curtiss B. Cook, Jingrui He• 2024

Related benchmarks

TaskDatasetResultRank
Data ImputationNPHA
Accuracy66.35
30
Data ImputationGliomas
Accuracy84.41
30
Data ImputationCancer
Accuracy73.51
28
Data ImputationDiabetes (1/3 omitted)
Accuracy63.21
16
Data ImputationDiabetes
Accuracy63.18
14
Data ImputationConcrete
MAE0.1036
14
Data ImputationYacht
MAE0.1478
14
Data ImputationWine
MAE0.0768
14
Data ImputationHousing
MAE0.1026
14
Data ImputationCredit-g
Accuracy54.46
13
Showing 10 of 14 rows

Other info

Follow for update