Pu Qiang, He Daqing, Yang Guowei. An Estimation of Query Language Model Based on Statistical Semantic Clustering[J]. Journal of Computer Research and Development, 2011, 48(2): 224-231.
Citation:
Pu Qiang, He Daqing, Yang Guowei. An Estimation of Query Language Model Based on Statistical Semantic Clustering[J]. Journal of Computer Research and Development, 2011, 48(2): 224-231.
Pu Qiang, He Daqing, Yang Guowei. An Estimation of Query Language Model Based on Statistical Semantic Clustering[J]. Journal of Computer Research and Development, 2011, 48(2): 224-231.
Citation:
Pu Qiang, He Daqing, Yang Guowei. An Estimation of Query Language Model Based on Statistical Semantic Clustering[J]. Journal of Computer Research and Development, 2011, 48(2): 224-231.
1(School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054) 2(School of Information Sciences, University of Pittsburgh, Pittsburgh 15260)
It is an important research direction in information retrieval to determine how to effectively generate clusters and use the information in clusters. Assuming that a document contains a set of independent hidden topics, a document is viewed as an interaction of independent hidden topics with some noise. A novel semantic clustering technique using independent component analysis is proposed according to this assumption. The perfect topic separation capability of independent component analysis will group a set of documents into different semantic clusters according to the hidden independent components in semantic space. Within language modeling framework, a certain semantic cluster is activated by a users initial query. A new query language model can be estimated by a users initial query model and a feedback semantic topic model which is estimated from the semantic cluster information in an activated semantic cluster. The estimated query model is applied in experiments on five TREC data sets. The experiment results show that the semantic cluster based query model can significantly improve retrieval performance over traditional query models and other cluster based language models. The main contribution of the improved performance comes from the estimation of query model on the semantic cluster that is most similar to a users query.