Empirical Study on Rare Query Categorization
-
Graphical Abstract
-
Abstract
Rare queries are those users submit to search engines very infrequently. They occupy a large fraction of different queries and affect users experience greatly. But little work has been done on rare queries in existing user behavior analysis due to the data sparseness problem. In this paper we make an empirical study on characterizing user behaviors on rare queries and obtain an overview of rare query composition. Large scale search logs collected from a commercial search engine are used. Based on the analysis of several features involving behaviors in goal query, related queries and entire session, we propose a semi-supervised categorization framework and use a modified AdaBoost to classify rare sessions. The results are evaluated on 2 000 randomly sampled rare sessions and the average AUC value is over 83%. This work will be helpful for Web search study including user behavior analysis concerning rare queries.
-
-