Query Classification Based on URL Topic
-
-
Abstract
Many online resources contain crowd intelligence. Categorized website directory is one kind of resources constructed and maintained manually. It aims to organize websites according to a topical taxonomy. Based on the URLs with topical labels in website directory, a URL topical classifier could be designed. Together with pseudo relevance feedback technique and search engine query logs, an automatic, fast and efficient query topical classification method is proposed. In detail, the method combines two strategies. Strategy-1 is to predict a query’s topic by computing the topic distribution among the returned URLs of a search system. Strategy-2 is to train a statistical classifier using the automatically labeled queries in query logs based on the topic of clicked URLs. The experimental results show that our method can achieve better precision compared with a state of the art algorithm and is more efficient for online processing. It has good scalability and can construct large scale training data from query logs automatically.
-
-