Advanced Search
    Xiao Meng, Zhou Junfeng, Zhou Yuanchun. Reinforcement Learning-Based Feature Generation Algorithm for Scientific Data[J]. Journal of Computer Research and Development, 2025, 62(9): 2127-2138. DOI: 10.7544/issn1000-1239.202550306
    Citation: Xiao Meng, Zhou Junfeng, Zhou Yuanchun. Reinforcement Learning-Based Feature Generation Algorithm for Scientific Data[J]. Journal of Computer Research and Development, 2025, 62(9): 2127-2138. DOI: 10.7544/issn1000-1239.202550306

    Reinforcement Learning-Based Feature Generation Algorithm for Scientific Data

    • Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the data-centric artificial intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the multi-agent feature generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations exhibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpretatively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return