高级检索

    大语言模型和知识图谱协同的跨域异质数据查询框架

    A Synergetic LLM-KG Framework for Cross-Domain Heterogeneous Data Query

    • 摘要: 大语言模型(large language model,LLM)技术热潮对数据质量的要求提升到了一个新的高度. 在现实场景中,数据通常来源不同且高度相关. 但由于数据隐私安全问题,跨域异质数据往往不允许集中共享,难以被LLM高效利用. 鉴于此,提出了一种LLM和知识图谱(knowledge graph,KG)协同的跨域异质数据查询框架,在LLM+KG的范式下给出跨域异质数据查询的一个治理方案. 为确保LLM能够适应多场景中的跨域异质数据,首先采用适配器对跨域异质数据进行融合,并构建相应的知识图谱. 为提高查询效率,引入线性知识图,并提出同源知识图抽取算法HKGE来实现知识图谱的重构,可显著提高查询性能,确保跨域异质数据治理的高效性. 进而,为保证多域数据查询的高可信度,提出可信候选子图匹配算法TrustHKGM,用于检验跨域同源数据的置信度计算和可信候选子图匹配,剔除低质量节点. 最后,提出基于线性知识图提示的多域数据查询算法MKLGP,实现LLM+KG范式下的高效可信跨域查询. 该方法在多个真实数据集上进行了广泛实验,验证了所提方法的有效性和高效性.

       

      Abstract: Recent advances in large language models (LLMs) have significantly elevated requirements for data quality in practical applications. Real-world scenarios often involve heterogeneous data from multiple correlated domains. Yet cross-domain data integration remains challenging due to privacy and security concerns that prohibit centralized sharing, thereby limiting LLM’s effective utilization. To address this critical issue, we propose a novel framework integrating LLM with knowledge graphs (KGs) for cross-domain heterogeneous data query. Our approach presents a systematic governance solution under the LLM-KG paradigm. First, we employ domain adapters to fuse cross-domain heterogeneous data and construct corresponding KG. To enhance query efficiency, we introduce knowledge line graphs and develop a homogeneous knowledge graph extraction (HKGE) algorithm for graph reconstruction, significantly improving cross-domain data governance performance. Subsequently, we propose a trusted subgraph matching algorithm TrustHKGM to ensure high-confidence multi-domain queries through confidence computation and low-quality node filtering. Finally, we design a multi-domain knowledge line graph prompting (MKLGP) algorithm to enable efficient and trustworthy cross-domain query answering within the LLM-KG framework. Extensive experiments on multiple real-world datasets demonstrate the superior effectiveness and efficiency of our approach compared with state-of-the-art solutions.

       

    /

    返回文章
    返回