高级检索

    基于核函数中文关系自动抽取系统的实现

    Implementation of a Kernel-Based Chinese Relation Extraction System

    • 摘要: 实体关系抽取是信息抽取的重要组成部分.基于核函数的中文实体关系自动抽取系统应用改进的语义序列核函数,结合KNN机器学习算法构造分类器来分类并标注关系的类型.通过对ACE评测定义的三大类6子类实体关系的抽取,关系抽取的平均精度可以达到88%,明显高于基于特征向量和传统的序列核函数方法,该方法适合小训练集,易于学习新的实体关系.系统由8个独立的模块构成,便于维护和升级.系统既可以独立运行,也可以嵌入在开放的文本处理平台GATE环境.为了更好地利用关系抽取的结果,系统扩展传统的二元关系,抽取关系的同时,抽取该关系的描述,形成完整的中文实体关系抽取系统.

       

      Abstract: Entity relation extraction (RE) is an important task in information extraction. In this paper, a novel kernel-based Chinese entity relation extraction system is presented, which appies the improved sequence kernel function with KNN learning algorithm to fulfill the RE task. Experiments are carried out on 3 kinds of relation types and their 6 subtypes defined in the ACE guidelines. Results show that the new approach achieves an average precision up to 88%, significantly higher than feature-based approaches and traditional kernel methods. The new approach has a better generalization capability especially on small training sets. The system consists of 8 independent modules including named entity detection, candidate generation, etc. for easy maintenance and update. The system is implemented either as a Java application or plug-ins on gate platform. It extracts not only the binary relation, but also their description such as job in employment relation.

       

    /

    返回文章
    返回