高级检索

    改进的PageRank在Web信息搜集中的应用

    Application of an Improved PageRank in Web Crawler

    • 摘要: PageRank是一种用于网页排序的算法,它利用网页间的相互引用关系评价网页的重要性.但由于它对每条出链赋予相同的权值,忽略了网页与主题的相关性,容易造成主题漂移现象.在分析了几种PageRank算法基础上,提出了一种新的基于主题分块的PageRank算法.该算法按照网页结构对网页进行分块,依照各块与主题的相关性大小对块中的链接传递不同的PageRank值,并能根据已访问的链接对块进行相关性反馈.实验表明,所提出的算法能较好地改进搜索结果的精确度.

       

      Abstract: The PageRank algorithm is used in ranking Web pages. It estimates the pages' authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, an improved PageRank algorithm based on topical segments is proposed. This algorithm segments the Web page into blocks and passes the page's PageRank to outlinks in each block in proportion with the block's relativity to the given topic. Moreover, it regards the visited outlink as feedback to modify the block's relevance. The experiment in Web crawler shows that the new algorithm has better performance.

       

    /

    返回文章
    返回