高级检索

    一种基于结构化语料库的概念语义网络自动生成算法

    An Approach to Generate Semantic Network of Concept Based on Structural Corpus

    • 摘要: 概念语义网络是为了解决信息检索中的词汇不匹配的问题而提出的,是提高检索效果的基本途径之一.以面向自然语言的网络答疑为应用背景,提出了一种基于半结构化语料库的概念语义网络自动生成算法.通过分析语料的组成特点,对不同的概念关系类型,采取不同的模板进行文档抽取,并设定不同的窗口单元计算概念间的相关度;然后经过阈值筛选和角色转换,获得各种类型的概念关系,在此基础上进行语义网络的优化调整.实验结果表明,本算法获得的概念语义网络可以有效地提高问题检索的效果.

       

      Abstract: Recent literature in computational terminology has shown an increasing interest in identifying various semantic relations between concept, which are important for large-scale natural language application systems such as question answering (QA), information retrieval (IR), machine translation (MT), and so on. Taking a natural-language-oriented Web answer system, named NL-WAS, as the application background, a novel approach to generate semantic network of concept based on the semi-structural corpus is proposed. According to the characteristic of the corpus, proper document extraction templates are adopted for 4 kinds of relations between concepts, namely, synonymy, hyponymy, hypernymy and parataxis. Moreover, different window sizes are designed to calculate the relative degree between concepts, and then by choosing the threshold through experimental results and switching the role can obtain all kinds of relationships. Finally, using proper rules, the concept semantic network is optimized. Now the proposed algorithm has already been implemented and applied in the natural language-oriented Web answer system. It is shown that the semantic network of concept can improve the result of the question search of NL-WAS system effectively.

       

    /

    返回文章
    返回