高级检索

    基于URL主题的查询分类方法

    Query Classification Based on URL Topic

    • 摘要: 互联网上很多资源蕴含人类群体智慧.分类网站目录人工地对网站按照主题进行组织.基于网站目录中具有主题标注的URL设计URL主题分类器,结合伪相关反馈技术以及搜索引擎查询日志,提出了自动、快速、有效的查询主题分类方法.具体地,方法为2种策略的结合.策略1通过计算搜索结果中URL的主题分布预测查询主题,策略2基于查询日志点击关系,利用具有主题标注的URL,对查询进行标注获取数据并训练统计分类器预测查询主题.实验表明,方法可获得比当前最好算法更好的准确率,更好的在线处理效率并且可基于查询日志自动获取训练数据,具有良好的可扩展性.

       

      Abstract: Many online resources contain crowd intelligence. Categorized website directory is one kind of resources constructed and maintained manually. It aims to organize websites according to a topical taxonomy. Based on the URLs with topical labels in website directory, a URL topical classifier could be designed. Together with pseudo relevance feedback technique and search engine query logs, an automatic, fast and efficient query topical classification method is proposed. In detail, the method combines two strategies. Strategy-1 is to predict a query’s topic by computing the topic distribution among the returned URLs of a search system. Strategy-2 is to train a statistical classifier using the automatically labeled queries in query logs based on the topic of clicked URLs. The experimental results show that our method can achieve better precision compared with a state of the art algorithm and is more efficient for online processing. It has good scalability and can construct large scale training data from query logs automatically.

       

    /

    返回文章
    返回