Abstract:
Internet plus TV tends to excessively consume storage space to achieve higher cache hit ratio. A novel cache schedule algorithm called PPRA(popularity prediction replication algorithm) is proposed in this paper based on programs popularity forecast. Firstly, according to statistical analysis from actual measurement, we apply random forests (RF) algorithm to construct a forecasting model of programs popularity. Subsequently, we use the principal component analysis (PCA) to overcome dimensionality curse and accelerate the forecasting process. Finally, we validate PPRA with authentic behavior data of a certain cable operator’s 1.3 million users in a period of 120 days. Our experimental results show that PPRA only consumes 30% storage space to achieve a fixed cache hit ratio compared with LRU and LFU algorithms, therefore the cost of Internet plus TV platform is saved.