高级检索

    基于多粒度特征交叉剪枝的点击率预测模型

    Multi-Granularity Based Feature Interaction Pruning Model for CTR Prediction

    • 摘要: 在推荐系统中,学习有效的高阶特征交互是提升点击率预测的关键. 现有的研究将低阶特征进行组合来学习高阶交叉特征表示,导致模型的时间复杂度随着特征维度的增加呈指数型增长;而基于深度神经网络的高阶特征交叉模型也无法很好地拟合低阶特征交叉,影响预测的准确率. 针对这些问题,提出了基于多粒度特征交叉剪枝的点击率预测模型FeatNet. 该模型首先在显式的特征粒度上,通过特征剪枝生成有效的特征集合,保持了不同特征组合的多样性,也降低了高阶特征交叉的复杂度;基于剪枝后的特征集合,在特征元素粒度上进一步进行隐式高阶特征交叉,通过滤波器自动过滤无效的特征交叉. 在2个真实的数据集上进行了大量的实验,FeatNet都取得了最优的点击率预测效果.

       

      Abstract: Learning effective high-order feature interactions is crucial for click through rate (CTR) prediction in recommender systems. Existing methods that learn meaningful high-order feature combinations by reassembling low-order feature combinations, i.e., 2-order feature interaction, suffer from high computational costs to calculate the interaction weight of all pairwise feature interactions. Some deep neural network-based methods can be seen as universal function approximators to potentially learn all kinds of feature interactions. However, it had been proved to be inefficient to approximate the low-order interactions, i.e., 2-order or 3rd-order feature interactions, which may influence the accuracy of CTR prediction task. Based on the above consideration, we propose a multi-granularity based feature interaction pruning network (FeatNet) for CTR prediction task. Firstly, FeatNet generates different subsets with a threshold pruning operation to select the meaningful feature combinations on the explicit feature granularity, which enables FeatNet to keep the diversity of different feature combinations, and reduce the complexity of high-order feature interactions. Based on the pruned feature subsets, implicit high-order feature interactions are further conducted on the granularity of feature elements, which automatically filters out the invalid feature interactions. Extensive experiments are conducted on two real-world datasets, showing the superiority of FeatNet in CTR prediction.

       

    /

    返回文章
    返回