ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (7): 1488-1499.doi: 10.7544/issn1000-1239.2017.20160556

• 人工智能 • 上一篇    下一篇

基于贝叶斯网的评价数据分析和动态行为建模

王飞1,岳昆1,孙正宝2,武浩1,冯辉1   

  1. 1(云南大学信息学院 昆明 650504);2(云南大学科技处 昆明 650504) (wangfei_989@163.com)
  • 出版日期: 2017-07-01
  • 基金资助: 
    国家自然科学基金项目(61472345,61562090);云南省应用基础研究计划重点项目(2014FA023);云南大学青年英才培育计划项目(WX173602);云南大学创新团队培育计划项目(XT412011);云南省教育厅科研基金项目(2016ZZX006)

Analyzing Rating Data and Modeling Dynamic Behaviors of Users Based on the Bayesian Network

Wang Fei1, Yue Kun1, Sun Zhengbao2, Wu Hao1, Feng Hui1   

  1. 1(School of Information Science and Engineering, Yunnan University, Kunming 650504);2(Department of Science and Technology, Yunnan University, Kunming 650504)
  • Online: 2017-07-01

摘要: 随着Web2.0的不断普及和电子商务应用的迅速发展,大规模的在线评价数据不断产生,使用户行为数据分析和用户行为建模成为可能,具有重要意义.考虑到用户评价数据和评价行为的动态性,提出以带有隐变量的贝叶斯网作为各属性间依赖关系及其不确定性表示的基本框架,构建既能刻画用户评价数据中各属性间相互依赖的不确定性、也能描述用户行为动态性的评价行为模型.首先,以贝叶斯信息标准(BIC)分值作为模型与数据拟合度的度量标准,提出基于打分搜索方法来构建各时间片的隐变量模型,并给出基于期望最大(EM)算法的隐变量取值填充方法;其次,基于条件互信息和时序的不可逆性,提出了相邻时间片间隐变量模型的构建方法.建立在MovieLens数据集上的实验结果验证了提出的动态用户行为建模方法的高效性及有效性.

关键词: 用户评价数据, 时序性, 隐变量模型, 贝叶斯网, 动态行为建模

Abstract: With the rapid development of Web2.0 and the e-commerce applications, large-scale online rating data are generated, which makes it possible to analyze users behavior data and model user behaviors. Considering the dynamic property of rating data and user behaviors, in this paper we adopt the Bayesian network with a latent variable (abbreviated as latent variable model) as the framework for describing mutual dependencies and corresponding uncertainties, and then construct the model that can reflect not only the uncertainty of dependence relationships among attributes in rating data but also the dynamic property of user behaviors. We first adopt the Bayesian information criterion (BIC) as the coincidence measure between candidate model and rating data, and then propose the scoring-and-search based method to construct the latent variable model. Then, we give the method for filling latent variable values based on the expectation maximization (EM) algorithm. Further, we propose the method for constructing the latent variable model between adjacent time slices based on conditional mutual information and irreversibility of time series. Finally, experimental results established on the MovieLens data set verify the efficiency and effectiveness of the method proposed in this paper.

Key words: user rating data, time-series, latent variable model, Bayesian network, dynamic behavior model

中图分类号: