基于深度语义分析的警务卷宗知识抽取

马健伟; 王铁鑫; 江宏; 陈涛; 张超; 李博涵

doi:10.7544/issn1000-1239.202330691

基于深度语义分析的警务卷宗知识抽取

Knowledge Extraction Based on Deep Semantics Analysis Towards Police Dossier

摘要

摘要: 卷宗作为公安机关办案、结案的主要记录，包含大量关键的警务信息. 面向警务卷宗的信息抽取是分析案情、挖掘犯罪趋势、提高治安管理水平的重要手段. 卷宗类文本多由基层警务人员采用自然语言书写，关键信息抽取难度大. 传统的警务卷宗信息抽取，多依赖人工及预定义模板，效率低且通用性差. 针对以上问题，参考卷宗的警务特征，提出了一种基于深度语义分析的卷宗知识抽取方法. 该方法包含命名实体识别与关系抽取2个核心内容. 提出的命名实体识别方法，融合了汉字结构特征和字形特征；提出的关系抽取方法建立在实体识别的基础上，实现基于触发规则和触发词的2种抽取模式. 在公开的微博数据集、项目合作方**市**分局的真实卷宗集上，提出的命名实体识别方法对比基线方法，在实体识别精确率及召回率上综合表现优异；自动抽取的关系也得到**分局的认可. 相关信息系统已在**分局部署使用.

Abstract: Police dossier, as one of the main records handled by the police department, contains massive and crucial policing information. As an important means, efficient information extraction from police dossier is helpful for case analysis, crime trend prediction, and the improvement of the public security management. However, the text of police dossier is written by police officers using natural language, which makes it difficult to extract crucial information. Traditional information extraction of police dossier heavily relies on manual effort and predefined templates, resulting in low efficiency and poor generality. Considering the particularity of police dossier, in this paper, a knowledge extraction method based on deep semantics analysis is proposed. This method consists of two core tasks: named entity recognition and relation extraction. Focusing on Chinese text, we propose a named entity recognition method that integrates structural and glyph features of Chinese characters. On the basis of entity recognition results, with the help of a specially constructed policing thesaurus, a relationship extraction method combining rule based and trigger word is proposed. Both on a publicly available Weibo dataset and a real dossier dataset provided by our partner a local police department, compared with several baseline named entity recognition models, our proposed method shows better performance in classifying exact entities and finding more potential entities. The automatically extracted relationships have also been verified and committed by the police department branch. A particular information system has been used in practice.

HTML全文

参考文献(26)

施引文献

资源附件(0)