• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于依存句法分析的病理报告结构化处理方法

田驰远, 陈德华, 王梅, 乐嘉锦

田驰远, 陈德华, 王梅, 乐嘉锦. 基于依存句法分析的病理报告结构化处理方法[J]. 计算机研究与发展, 2016, 53(12): 2669-2680. DOI: 10.7544/issn1000-1239.2016.20160611
引用本文: 田驰远, 陈德华, 王梅, 乐嘉锦. 基于依存句法分析的病理报告结构化处理方法[J]. 计算机研究与发展, 2016, 53(12): 2669-2680. DOI: 10.7544/issn1000-1239.2016.20160611
Tian Chiyuan, Chen Dehua, Wang Mei, Le Jiajin. Structured Processing for Pathological Reports Based on Dependency Parsing[J]. Journal of Computer Research and Development, 2016, 53(12): 2669-2680. DOI: 10.7544/issn1000-1239.2016.20160611
Citation: Tian Chiyuan, Chen Dehua, Wang Mei, Le Jiajin. Structured Processing for Pathological Reports Based on Dependency Parsing[J]. Journal of Computer Research and Development, 2016, 53(12): 2669-2680. DOI: 10.7544/issn1000-1239.2016.20160611
田驰远, 陈德华, 王梅, 乐嘉锦. 基于依存句法分析的病理报告结构化处理方法[J]. 计算机研究与发展, 2016, 53(12): 2669-2680. CSTR: 32373.14.issn1000-1239.2016.20160611
引用本文: 田驰远, 陈德华, 王梅, 乐嘉锦. 基于依存句法分析的病理报告结构化处理方法[J]. 计算机研究与发展, 2016, 53(12): 2669-2680. CSTR: 32373.14.issn1000-1239.2016.20160611
Tian Chiyuan, Chen Dehua, Wang Mei, Le Jiajin. Structured Processing for Pathological Reports Based on Dependency Parsing[J]. Journal of Computer Research and Development, 2016, 53(12): 2669-2680. CSTR: 32373.14.issn1000-1239.2016.20160611
Citation: Tian Chiyuan, Chen Dehua, Wang Mei, Le Jiajin. Structured Processing for Pathological Reports Based on Dependency Parsing[J]. Journal of Computer Research and Development, 2016, 53(12): 2669-2680. CSTR: 32373.14.issn1000-1239.2016.20160611

基于依存句法分析的病理报告结构化处理方法

基金项目: 上海市科技创新行动计划项目(15511106900);上海市科技发展基金项目(16JC1400802);中央高校基本科研业务费东华大学励志计划项目(B201312);上海市信息化发展专项资金项目(XX-XXFZ-01-14-6349)
详细信息
  • 中图分类号: TP391

Structured Processing for Pathological Reports Based on Dependency Parsing

  • 摘要: 病理检查报告中的文本通常为非结构化数据,不利于计算机自动分析和处理.目前文本结构化主要采用信息关系抽取方法,然而病理检查报告所具有的语义特殊性,给中文信息关系抽取带来了挑战.为解决上述问题,设计了一种针对病理检查报告的结构化方法,首先通过神经网络语言模型获得病理报告中的同义词表,合并一义多词现象;在此基础上,生成病理检查报告文本的依存关系树,并提出切分短句和信息标注的剪裁策略,以简化初始生成的依存关系树结构,从而使语法关系更加清晰,提高结构化结果的准确度;进而,利用依存句法分析结果从中文检查报告中提取指标及对应指标值,并自动生成结构化模板.实验采用医生真实使用的医疗病理检查报告进行验证,其结果表明:该方法在指标词和对应指标值提取任务中的准确率可以分别达到82.91%和79.11%,为相关研究打下了基础.
    Abstract: Most of pathological reports are unstructured texts which can not be directly analyzed by computers. The current researches on structured texts mainly focus on the information extraction. However, the syntactic features of pathological reports are particular, which makes it more difficult to extract information relations. To solve this problem, a novel method of structuralizing pathological reports based on syntactic and semantic features is proposed in this paper. First of all, we construct a synonym lexicon by using neural network language models to eliminate the phenomenon of synonymy. Then the dependency trees are generated based on the preprocessed pathological reports to extract medical examination indices. Meanwhile, we use short-sentence segmentation and annotation as optimized strategies to simplify the structure of dependency trees, which makes the grammatical relations of medical texts clearer and improves the quality of the structured results. Finally the key-value pairs of medical examination indices can be extracted from pathological reports in Chinese, and the structured texts can be generated automatically. Experimental results based on real pathological report data sets show that the performance of the proposed method on medical indices and values extraction achieves 82.91% and 79.11% of accuracy, which provides a solid foundation for related studies in the future.
计量
  • 文章访问数:  1595
  • HTML全文浏览量:  0
  • PDF下载量:  934
  • 被引次数: 0
出版历程
  • 发布日期:  2016-11-30

目录

    /

    返回文章
    返回