Abstract:
Query expansion is one of the important steps in information retrieval. Many studies have shown its effectiveness in improving retrieval performance. Most of the query expansion methods analyze each expansion term separately and select several best terms as the query expansion. However, several studies have shown that combining several individually best terms can not guarantee the best query expansion. In this paper, the impacts of term combinations are considered and the result shows that the retrieval performance is impacted largely by them. To address this problem, this paper tries to select expansion terms as a whole instead of one by one. The query expansion task is formulated as an integer linear programming problem, which is a global optimization problem. In this model, the weights of each candidate term are learned by a supervised method, and a few constraints are added to capture the relation among terms. By solving this global optimization problem, the best query expansion can be found. Finally, the experiment on three TREC collections shows that compared with the baseline query expansion method, the retrieval effectiveness can be greatly improved by selecting expansion terms as a set via integer linear programming.