高级检索

    基于决策树的异构特征空间学件组装方法

    Decision Tree-Based Assembly for Learnwares from Heterogeneous Feature Spaces

    • 摘要: 机器学习技术已广泛应用于许多领域,但仍面临诸多挑战,包括对海量训练数据的依赖、难以适应环境变化、数据隐私与所有权问题,以及灾难性遗忘等。学件范式为系统性解决这些问题提供了新思路,其核心在于有组织地复用已有的高质量模型解决新任务,而不必从零训练模型。学件由模型和描述其能力的规约构成,各式各样的学件通过学件基座系统统一管理。已有研究验证了当学件与用户任务特征空间相同时该范式的有效性,然而在实际场景中,与用户任务特征空间完全匹配的学件往往稀缺,甚至不存在. 目前已有研究初步探索了异构特征空间下的单学件查搜与复用,然而单一学件性能有限,难以在整个用户任务上表现出色。多个异构特征学件在性能上虽有互补潜力,但如何准确识别并整合推荐学件的局部优势区域成为关键挑战。为此,提出一种类决策树结构的多学件组装复用算法,能够有效整合多个来自不同特征空间学件的高置信度区域。为进一步提升学件查搜的准确性,还提出了基于学件复用性能预估的查搜准则。实验结果表明,即使系统中不存在与用户任务完全匹配的模型,系统仍可有效查搜出多个对用户任务有帮助的异构特征学件,并通过树形组装推荐学件显著超越用户本地从头训练模型的性能。

       

      Abstract: Machine learning techniques have been widely applied in various fields but still face numerous challenges, including heavy dependence on large-scale training data, limited adaptability to changing environments, concerns regarding data privacy and ownership, and the issue of catastrophic forgetting. The learnware paradigm provides a systematic solution to these challenges, focusing on reusing existing high-quality models to tackle new tasks instead of training models from scratch. The learnware comprises a model and a specification describing its capabilities. Various learnwares are managed by the learnware dock system. Existing studies have shown the effectiveness of this paradigm when learnwares and user tasks share the same feature space. However, in real-world scenarios, the system often lacks models that precisely match the feature space of user's task. Current work has preliminarily explored the identification and reuse of single learnware in heterogeneous feature spaces. However, the performance of a single heterogeneous learnware is limited and may fall short when addressing the entirety of a user task. While multiple heterogeneous learnwares can complement each other’s capabilities, the key challenge lies in identifying and effectively integrating the localized strengths of different heterogeneous learnwares. This paper proposes a decision tree-like learnware assembly algorithm, which effectively integrates the high-confidence regions of multiple heterogeneous learnwares. Furthermore, to facilitate the effective learnware recommendation, this paper introduces an identification mechanism based on reuse performance estimation. Experiments demonstrate that even in the absence of models that perfectly match the user's task, the proposed method can effectively assist the system in recommending potentially useful heterogeneous learnwares and significantly outperform models trained from scratch through tree-like assembly.

       

    /

    返回文章
    返回