高级检索

    基于整数线性规划的查询扩展

    Selecting Expansion Terms as a Set Via Integer Linear Programming

    • 摘要: 查询扩展是信息检索过程中重要的一步.在现有的研究中,大部分查询扩展方法都是孤立地考察每个词对查询扩展的重要性,挑选最好的几个词作为查询扩展.但已有研究表明,多个最好的扩展词组合到一起效果并不一定是最好的.尝试以集合的方式挑选扩展词:首先通过监督的方法学习单个扩展词的权重以及扩展词之间的约束关系,然后根据一些约束条件,将查询扩展的任务转化成一个整数线性规划问题.通过解决整数线性规划这样一个全局优化的问题来得到最好的扩展词组合.最后,通过在3个TREC标准数据集上的对比实验说明,该方法能显著地提升查询扩展的效果.

       

      Abstract: Query expansion is one of the important steps in information retrieval. Many studies have shown its effectiveness in improving retrieval performance. Most of the query expansion methods analyze each expansion term separately and select several best terms as the query expansion. However, several studies have shown that combining several individually best terms can not guarantee the best query expansion. In this paper, the impacts of term combinations are considered and the result shows that the retrieval performance is impacted largely by them. To address this problem, this paper tries to select expansion terms as a whole instead of one by one. The query expansion task is formulated as an integer linear programming problem, which is a global optimization problem. In this model, the weights of each candidate term are learned by a supervised method, and a few constraints are added to capture the relation among terms. By solving this global optimization problem, the best query expansion can be found. Finally, the experiment on three TREC collections shows that compared with the baseline query expansion method, the retrieval effectiveness can be greatly improved by selecting expansion terms as a set via integer linear programming.

       

    /

    返回文章
    返回