高级检索

    基于图和改进K近邻模型的高效协同过滤推荐算法

    An Efficient Collaborative Filtering Algorithm Based on Graph Model and Improved KNN

    • 摘要: 在互联网高速发展的今天,推荐系统已成为解决信息过载的有效手段,能够缓解用户在筛选感兴趣信息时的困扰,帮助用户发现有价值的信息.推荐系统中的协同过滤推荐算法,因其领域无关性及支持用户发现潜在兴趣的优点被广泛应用.由于数据的规模过大且稀疏的特点,当前协同过滤在算法实时性、推荐精确度等方面仍有较大提升空间.提出了GK-CF方法,通过建立基于图的评分数据模型,将传统的协同过滤算法与图计算及改进的KNN算法结合.通过图的消息传播及改进的相似度计算模型对用户先进行筛选再做相似度计算;以用户-项目二部图的节点结构为基础,通过图的最短路径算法进行待评分项目的快速定位.在此基础上,进一步通过并行图框架对算法进行了并行化实现及优化.在物理集群环境下进行了实验,结果表明,与已有的协同过滤算法相比,提出的GK-CF算法能够很好地提高推荐的准确度和评分预测的准确性,并具有较好的算法可扩展性和实时性能.

       

      Abstract: With the rapid development of Internet, recommender system has been considered as a typical method to deal with the over-loading of Internet information. The recommender system can partially alleviate user’s difficulty on information filtering and discover valuable information for the active user. Collaborative filtering algorithm has the advantages of domain independence and supports users’ potential interests. For these reasons, collaborative filtering has been widely used. Because the user item rating matrix is sparse and in large-scale, recommender system is facing big challenges of precision and performance. This paper puts forward a GK-CF algorithm. By building a graph-based rating data model, the traditional collaborative filtering, graph algorithms and improved KNN algorithm have been integrated. Through the message propagation in the graph and the improved user similarity calculation model, candidate similar users will be selected firstly before the calculation of users similarity. Based on the topology of bipartite graph, the GK-CF algorithm ensures the quick and precise location of the candidate items through the shortest path algorithm. Under the parallel graph framework, GK-CF algorithm has been parallelized design and implement. The experiments on real world clusters show that: compared with the traditional collaborative filtering algorithm, the GK-CF algorithm can better improve recommendation precision and the rating accuracy. The GK-CF algorithm also has good scalability and real-time performance.

       

    /

    返回文章
    返回