ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (5): 920-932.doi: 10.7544/issn1000-1239.2018.20160926

• 人工智能 • 上一篇    下一篇



  1. 1(南京理工大学计算机科学与工程学院 南京 210094); 2(南京财经大学江苏省电子商务重点实验室 南京 210003) (
  • 出版日期: 2018-05-01
  • 基金资助: 

A Recommendation Engine for Travel Products Based on Topic Sequential Patterns

Zhu Guixiang1,Cao Jie1,2   

  1. 1(College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094); 2(Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing 210003)
  • Online: 2018-05-01

摘要: 旅游产品推荐是当前推荐系统研究领域中的新兴议题之一.由于旅游产品描述信息维度多样复杂、“用户-产品”关联矩阵极为稀疏且冷启动问题突出,已经在电子商务领域获得成功的协同过滤推荐往往难以直接被应用于旅游产品推荐.提出基于主题序列模式的旅游产品推荐引擎SECT,试图通过在线旅游网站点击日志的挖掘产生推荐.首先,从页面语义描述文本中挖掘主题,以在泛化层面捕捉用户行为模式;其次,从页面访问时间序列数据中挖掘频繁序列模式及其候选产品集,形成序列模式库;最后,提出Markov n-gram模型,完成用户实时点击流与模式库匹配计算.为了提升在线匹配计算的效率,设计一种新的多叉树数据结构PSC-tree用于存储历史模式库,并与在线计算模块无缝衔接.在真实旅游数据集上的实验结果表明:该推荐引擎比传统推荐算法具有更优越的性能,而且能有效提升冷启动用户的推荐率和准确率.此外,针对长尾物品的推荐,SECT也优于基准算法.

关键词: 旅游产品推荐, 频繁序列模式, 冷启动用户, Web日志数据, 推荐系统

Abstract: Travel products recommendation has become one of emerging issues in the realm of recommendation systems. The widely-used collaborative filtering algorithms are usually difficult to be used for recommending travel products due to a number of reasons, including: 1) the content of travel products is very complex, 2) the user-item matrix is extremely sparse, and 3) the cold-start users are widely existing. To tackle these issues, we try to exploit Web server logs for generating recommenda-tion, and present a novel recommendation engine (SECT for short) for travel products based on topic sequential patterns. In detail, we first extract topics from semantic description of every Web page. Then, we mine topic frequent sequential patterns and their target products to form click patterns library. At last, we propose a Markov n-gram model for matching the real-time click-stream of users with the click patterns library and thus computing recommendation scores. To enhance the efficiency of online computing, we design a new multi-branch tree data structures called PSC-tree to store the historical click patterns library and integrate with online computing module seamlessly. Experimental results on a real-world travel dataset demonstrate that the SECT prevails over the state-of-art baseline algorithms. In particular, SECT shows merits in improving both the coverage and accuracy for recommending products to cold-start users. Also, SECT is effective to recommend long tail items and outperform baseline algorithms.

Key words: travel recommendation, frequent sequential pattern, cold-start users, Web server logs, recommender system