高级检索

    使用遗传算法的信息检索动态参数学习方法

    Dynamic Parameter Learning Approach for Information Retrieval with Genetic Algorithm

    • 摘要: 信息检索系统中的参数设定在很大程度上决定着系统的检索性能.参数的数据相关性和敏感性使得经验值往往不可靠.另一方面,由于在检索过程中缺乏当前查询的相关文档信息,因而不可能进行有指导的参数学习.因此,自动无指导的参数学习方法是极为必要和重要的.首先考察传统上根据经验值设定固定的系统参数的效果,结果表明其泛化能力差,效果不稳定且不可靠.其次,提出一种使用遗传算法进行动态参数学习的方法.在TREC11,TREC10和TREC9三组大规模Web标准测试数据集上进行了实验,数据集规模均超过10GB. 实验结果表明,经过动态参数学习,系统性能总是能够接近甚至达到可能实现的最优性能.

       

      Abstract: Parameter setting in information retrieval (IR) systems affects retrieval performances greatly. These parameters are always data-dependent and sensitive, which causes the fallibility of experiential values. Moreover, supervised parameter learning approaches are not applicable for lacking of relevant information while retrieving. Therefore, an automatic unsupervised parameter learning mechanism is necessary and important. In this paper, the effectiveness of traditional manual parameter setting with fixed experiential values is studied first, which indicates that the traditional way is not feasible or reliable to use widely in practice. Then, a dynamic parameter learning approach with genetic algorithm (GA) is proposed. Experiments have been done on Okapi system using large scale data sets of TREC11, TREC10 and TREC9 web track collections, each of which is more than 10GB. Results show that by dynamic parameter learning, the system always gets or approaches the best retrieval performance.

       

    /

    返回文章
    返回