高级检索

    代价敏感的列表排序算法

    Cost-Sensitive Listwise Ranking Approach

    • 摘要: 排序学习是信息检索与机器学习中的研究热点之一.在信息检索中,预测排序列表中顶部排序非常重要.但是,排序学习中一类经典的排序算法——列表排序算法——无法强调预测排序列表中顶部排序.为了解决此问题,将代价敏感学习的思想融入到列表排序算法中,提出代价敏感的列表排序算法框架.该框架是在列表排序算法的损失函数中对文档引入权重,且基于性能评价指标NDCG计算文档的权重.在此基础之上,进一步证明了代价敏感的列表排序算法的损失函数是NDCG损失的上界.为了验证代价敏感的列表排序算法的有效性,在此框架下提出了一种代价敏感的ListMLE排序算法,并对该算法开展序保持与泛化性的理论研究工作,从理论上验证了该算法具有序保持特性.在基准数据集上的实验结果表明,在预测排序列表中顶部排序中,代价敏感的ListMLE比传统排序学习算法能取得更好的性能.

       

      Abstract: Learning to rank is a popular research area in machine learning and information retrieval (IR). In IR, the ranking order on the top of the ranked list is very important. However, listwise approach, a kind of classical approach in learning to rank, cannot emphasize the ranking order on the top of the ranked list. To solve the problem, the idea of cost-sensitive learning is brought into the listwise approach, and a framework for cost-sensitive listwise approach is established. The framework imposes weights for the documents in the listwise loss functions. The weights are computed based on evaluation measure: Normalized Discounted Cumulative Gain (NDCG). It is proven that the losses of cost-sensitive listwise approaches are the upper bound of the NDCG loss. As an example, a cost-sensitive ListMLE method is proposed. Moreover, the theoretical analysis is conducted on the order preservation and generalization of cost-sensitive ListMLE. It is proven that the loss function of cost-sensitive ListMLE is order preserved. Experimental results on the benchmark datasets indicate that the cost-sensitive ListMLE achieves higher ranking performance than the baselines on the top of the ranked list.

       

    /

    返回文章
    返回