Abstract:
Police dossier, as one of the main records handled by the police department, contains massive and crucial policing information. As an important means, efficient information extraction from police dossier is helpful for case analysis, crime trend prediction, and the improvement of the public security management. However, the text of police dossier is written by police officers using natural language, which makes it difficult to extract crucial information. Traditional information extraction of police dossier heavily relies on manual effort and predefined templates, resulting in low efficiency and poor generality. Considering the particularity of police dossier, in this paper, a knowledge extraction method based on deep semantics analysis is proposed. This method consists of two core tasks: named entity recognition and relation extraction. Focusing on Chinese text, we propose a named entity recognition method that integrates structural and glyph features of Chinese characters. On the basis of entity recognition results, with the help of a specially constructed policing thesaurus, a relationship extraction method combining rule based and trigger word is proposed. Both on a publicly available Weibo dataset and a real dossier dataset provided by our partner a local police department, compared with several baseline named entity recognition models, our proposed method shows better performance in classifying exact entities and finding more potential entities. The automatically extracted relationships have also been verified and committed by the police department branch. A particular information system has been used in practice.