ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (4): 742-751.doi: 10.7544/issn1000-1239.2016.20151143

• 网络技术 • 上一篇    下一篇

基于流行度预测的互联网+电视节目缓存调度算法

朱琛刚,程光,胡一非,王玉祥   

  1. (东南大学计算机科学与工程学院 南京 211189) (教育部计算机网络和信息集成重点实验室(东南大学) 南京 211189) (gcheng@njnet.edu.cn)
  • 出版日期: 2016-04-01
  • 基金资助: 
    国家“八六三”高技术研究发展计划基金项目(2015AA015603);江苏省未来网络创新研究院未来网络前瞻性研究项目(BY2013095-5-03);江苏省“六大人才高峰”高层次人才项目(2011-DZ024)

A Caching Strategy for Internet Plus TV Based on Popularity Prediction

Zhu Chengang, Cheng Guang, Hu Yifei, Wang Yuxiang   

  1. (School of Computer Science and Engineering, Southeast University, Nanjing 211189) (Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing 211189)
  • Online: 2016-04-01

摘要: 针对互联网+电视平台为提高热点节目命中率而过渡消耗存储空间的问题,提出一种基于流行度预测的节目缓存调度算法PPRA(popularity prediction replication algorithm).首先,在对实际测量数据进行统计与分析的基础上,使用随机森林(random forests, RF)算法构建节目流行度预测模型.同时,针对所选特征存在的“维数灾难”问题,利用主成分分析法(principal component analysis, PCA)实施特征降维处理,以实现视频流行度预测值的快速计算.然后基于节目流行度预测数据调度缓存中的节目.最后以某广电运营商130万用户120 d的收视数据为例,对PPRA算法进行实验.实验结果表明,在保证一定缓存命中率前提下,与LRU,LFU算法相比,PPRA算法仅需30%的存储空间,可有效降低互联网+电视平台的建设成本.

关键词: 互联网+电视, 流行度预测, 随机森林, 缓存策略, 维数灾难

Abstract: Internet plus TV tends to excessively consume storage space to achieve higher cache hit ratio. A novel cache schedule algorithm called PPRA(popularity prediction replication algorithm) is proposed in this paper based on programs popularity forecast. Firstly, according to statistical analysis from actual measurement, we apply random forests (RF) algorithm to construct a forecasting model of programs popularity. Subsequently, we use the principal component analysis (PCA) to overcome dimensionality curse and accelerate the forecasting process. Finally, we validate PPRA with authentic behavior data of a certain cable operator’s 1.3 million users in a period of 120 days. Our experimental results show that PPRA only consumes 30% storage space to achieve a fixed cache hit ratio compared with LRU and LFU algorithms, therefore the cost of Internet plus TV platform is saved.

Key words: Internet plus TV, popularity prediction, random forests (RF), caching strategy, dimensionality curse

中图分类号: