Evolutionary Large Language Model for Automated Feature Transformation

About

Feature transformation aims to reconstruct the feature space of raw features to enhance the performance of downstream models. However, the exponential growth in the combinations of features and operations poses a challenge, making it difficult for existing methods to efficiently explore a wide space. Additionally, their optimization is solely driven by the accuracy of downstream models in specific domains, neglecting the acquisition of general feature knowledge. To fill this research gap, we propose an evolutionary LLM framework for automated feature transformation. This framework consists of two parts: 1) constructing a multi-population database through an RL data collector while utilizing evolutionary algorithm strategies for database maintenance, and 2) utilizing the ability of Large Language Model (LLM) in sequence understanding, we employ few-shot prompts to guide LLM in generating superior samples based on feature transformation sequence distinction. Leveraging the multi-population database initially provides a wide search scope to discover excellent populations. Through culling and evolution, the high-quality populations are afforded greater opportunities, thereby furthering the pursuit of optimal individuals. Through the integration of LLMs with evolutionary algorithms, we achieve efficient exploration within a vast space, while harnessing feature knowledge to propel optimization, thus realizing a more adaptable search paradigm. Finally, we empirically demonstrate the effectiveness and generality of our proposed method.

Nanxu Gong, Chandan K.Reddy, Wangyang Ying, Haifeng Chen, Yanjie Fu• 2024

Related benchmarks

Task	Dataset	Result
Classification	Electricity	Mean Test Error Rate0.0679	27
Classification	German Credit UCIrvine	Macro F177.5	25
Regression	Airfoil UCIrvine	1-RAE0.6174	24
Regression	Openml_586	1-RAE0.6328	24
Classification	German Credit UCIrvine (5-fold cross-val)	Macro F10.7639	17
Classification	Ionosphere UCIrvine (5-fold cross-validation)	Macro F1 Score96.01	17
Classification	PimaIndian Kaggle (5-fold cross-validation)	Macro F1 Score89.66	17
Classification	Messidor Feature UCIrvine (5-fold cross-validation)	Macro F10.748	17
Classification	SVMGuide3 LibSVM (5-fold cross-val)	Macro F182.7	17
Classification	Amazon Employee Kaggle (5-fold cross-validation)	Macro F193.17	17

Showing 10 of 36 rows

Other info

Follow for update

@wizwand_team Discord