ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2014, Vol. 51 ›› Issue (10): 2216-2224.doi: 10.7544/issn1000-1239.2014.20130339

Previous Articles     Next Articles

Exploration of Weighted Proximity Measure in Information Retrieval

Xue Yuanhai1,2,3, Yu Xiaoming1,2, Liu Yue1,2, Guan Feng1,2,3, Cheng Xueqi1,2   

  1. 1(Key Laboratory of Network Data Science and Technology, Chinese Academy of Sciences, Beijing 100190); 2(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190); 3(University of Chinese Academy of Sciences, Beijing 100190)
  • Online:2014-10-01

Abstract: A key problem of information retrieval is to provide information takers with relevant, accurate and even complete information. Lots of traditional information retrieval models are based on the bag-of-words assumption, without considering the implied associations among the query terms. Although term proximity has been widely used for boosting the performance of the classical information retrieval models, most of those efforts do not fully consider the different importance between the query terms. For queries in modern information retrieval, the query terms are not only dependent of each other, but also different in importance. Thus, computing the term proximity with taking into account the different importance of terms will be helpful to improve the retrieval performance. In order to achieve this, a weighted term proximity measure method is introduced, which distinguishes the significance of the query terms based on the collections to be searched. Weighted proximity BM25 model(WP-BM25) that integrating this method into the Okapi BM25 model is proposed to rank the retrieved documents. A large number of experiments are conducted on three standard TREC collections which are FR88-89, WT2G and WT10G. The results show that the weighted proximity BM25 model can significantly improve the retrieval performance, and it has good robustness.

Key words: weighted proximity, measure method, BM25, term significance, information retrieval

CLC Number: