• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Luo Zhiyong, Song Rou. Disambiguation in a Modern Chinese General-Purpose Word Segmentation System[J]. Journal of Computer Research and Development, 2006, 43(6): 1122-1128.
Citation: Luo Zhiyong, Song Rou. Disambiguation in a Modern Chinese General-Purpose Word Segmentation System[J]. Journal of Computer Research and Development, 2006, 43(6): 1122-1128.

Disambiguation in a Modern Chinese General-Purpose Word Segmentation System

More Information
  • Published Date: June 14, 2006
  • Disambiguation is one of the most important parts of segment systems in Chinese. A Chinese general-purpose word segmentation (GPWS) system demands higher capacity of disambiguation techniques particularly, because it has functions such as allowing users to create their own dictionaries dynamically and employing multiple user's dictionaries to word segmentation. Based on inspection of the distributions and characteristics of ambiguity fragments (especially overlapping ambiguity fragments) in large-scale real corpus, an improved forward maximum match algorithm for ambiguity fragment detection, as well as a practical “rules + exceptions” disambiguation strategy, are proposed in this paper. An exhaustive extraction has been made of the overlapping ambiguity sections (about 2.4 million occurrences) from a People's Daily corpus of 100 million characters (234MB approximately), and open-ended experiments on the above strategy randomly were carried out, which achieved accuracy average of 99%.
  • Related Articles

    [1]Qu Zhiguo, Chen Weilong, Sun Le, Liu Wenjie, Zhang Yanchun. ECG-QGAN: A ECG Generative Information System Based on Quantum Generative Adversarial Networks[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440527
    [2]Zhong Jiancheng, Fang Zhuo, Qu Zuohang, Zhong Ying, Peng Wei, Pan Yi. Essential Proteins Prediction Method Based on Dynamic Network Segmentation[J]. Journal of Computer Research and Development, 2022, 59(7): 1569-1588. DOI: 10.7544/issn1000-1239.20210391
    [3]Sun Caixia, Zheng Zhong, Deng Quan, Sui Bingcai, Wang Yongwen, Ni Xiaoqiang. DMR: An Out-of-Order Superscalar General-Purpose CPU Core Based on RISC-V[J]. Journal of Computer Research and Development, 2021, 58(6): 1230-1233. DOI: 10.7544/issn1000-1239.2021.20210176
    [4]Pan Xudong, Zhang Mi, Yan Yifan, Lu Yifan, Yang Min. Evaluating Privacy Risks of Deep Learning Based General-Purpose Language Models[J]. Journal of Computer Research and Development, 2021, 58(5): 1092-1105. DOI: 10.7544/issn1000-1239.2021.20200908
    [5]Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang. Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey[J]. Journal of Computer Research and Development, 2020, 57(6): 1191-1207. DOI: 10.7544/issn1000-1239.2020.20200113
    [6]Xu Shibo, Liu Xiaolan, Ren Fengyuan. Splitting and Restructuring a WLAN Dynamically[J]. Journal of Computer Research and Development, 2016, 53(1): 193-205. DOI: 10.7544/issn1000-1239.2016.20148143
    [7]Huang Degen, Jiao Shidou, and Zhou Huiwei. Dual-Layer CRFs Based on Subword for Chinese Word Segmentation[J]. Journal of Computer Research and Development, 2010, 47(5): 962-968.
    [8]Wu Yunfang, Wang Miao, Jin Peng, Yu Shiwen. Ensembles of Classifiers for Chinese Word Sense Disambiguation[J]. Journal of Computer Research and Development, 2008, 45(8): 1354-1361.
    [9]Quan Changqin, He Tingting, Ji Donghong, Yu Shaowen. Word Sense Disambiguation Based on Multi-Classifier Decision[J]. Journal of Computer Research and Development, 2006, 43(5): 933-939.
    [10]Xiong Yueshan, Luo Jun, Tan Ke, Wang Yanzhen, Guo Guangyou. A New Soft-Tissue Cutting Algorithm Based on Element Subdivision[J]. Journal of Computer Research and Development, 2005, 42(12): 2132-2136.

Catalog

    Article views (734) PDF downloads (942) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return