高级检索
    马健伟, 王铁鑫, 江宏, 陈涛, 张超, 李博涵. 基于深度语义分析的警务卷宗知识抽取[J]. 计算机研究与发展. DOI: 10.7544/issn1000-1239.202330691
    引用本文: 马健伟, 王铁鑫, 江宏, 陈涛, 张超, 李博涵. 基于深度语义分析的警务卷宗知识抽取[J]. 计算机研究与发展. DOI: 10.7544/issn1000-1239.202330691
    Knowledge Extraction based on Deep Semantics Analysis towards Police Case Texts[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330691
    Citation: Knowledge Extraction based on Deep Semantics Analysis towards Police Case Texts[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330691

    基于深度语义分析的警务卷宗知识抽取

    Knowledge Extraction based on Deep Semantics Analysis towards Police Case Texts

    • 摘要: 卷宗作为公安机关办案、结案的主要记录,包含大量、关键的警务信息.面向警务卷宗的信息抽取是分析案情、挖掘趋势、提高治安管理水平的重要手段.卷宗类文本多由基层警务人员采用自然语言书写,关键信息抽取的难度较大.传统的警务卷宗信息抽取,多依赖人工及预定义模板,效率低且通用性差.针对以上问题,参考卷宗的警务特征,提出了一种基于深度语义分析的卷宗知识抽取方法.该方法包含命名实体识别及关系抽取两个核心内容.提出的命名实体识别方法,融合了汉字结构特征和字形特征;提出的关系抽取方法建立在实体识别的基础上,实现基于触发规则和触发词的两种抽取模式.在公开的微博数据集、项目合作方**市**分局的真实卷宗集上,提出的命名实体识别方法对比基线模型,在实体识别准确率及召回率上,综合表现优异;自动抽取的关系也得到**分局的认可.相关信息系统已在**分局部署使用.

       

      Abstract: Police dossier, as one of the main records handled by the police department, contains massive and crucial policing infor-mation. As an important means, efficient information extraction from police dossier is helpful for case analysis, crime trend prediction, and the improvement of the public security management. However, the text of police dossier is written by police officers using natural language, which makes it difficult to extract crucial information. Traditional information extraction of police dossier heavily relies on manual effort and predefined templates, resulting in low efficiency and poor generality. Considering the particularity of police dossier, in this paper, a knowledge extraction method based on deep semantics analysis is proposed. This method consists of two core tasks: named entity recognition and relation extraction. Focusing on Chinese text, we propose a named entity recognition method that integrates structural and glyph features of Chinese characters. On the basis of entity recognition results, with the help of a specially constructed policing thesaurus, a relationship extraction method combining rule based and trigger word is proposed. Both on a publicly available Weibo dataset and a real dossier dataset provided by our partner a police department, compared with several baseline named entity recognition models, our proposed method showed better performance in classifying exact entities and finding more potential entities. The automatically extracted relationships have also been verified and committed by the police de-partment branch. A particular information system integrates our method has been used in practice.

       

    /

    返回文章
    返回