Advanced Search
    Zhang Xianchao, Xu Wen, Gao Liang, and Liang Wenxin. Combining Content and Link Analysis for Local Web Community Extraction[J]. Journal of Computer Research and Development, 2012, 49(11): 2352-2358.
    Citation: Zhang Xianchao, Xu Wen, Gao Liang, and Liang Wenxin. Combining Content and Link Analysis for Local Web Community Extraction[J]. Journal of Computer Research and Development, 2012, 49(11): 2352-2358.

    Combining Content and Link Analysis for Local Web Community Extraction

    • Most studies on Web community extraction only focus on pure link analysis, thus textual properties of Web pages that are interconnected via complex hyperlinks are neglected. An improved algorithm based on Flakes method using the maximum flow algorithm is proposed in this paper. Based on the fact that the more similar contents the two pages have, the more authority they exchange, the lexical similarity of Web pages is used for the assignment of edge capacities. In this paper, two methods, MT(Max-flow+TF-IDF) assignment and MTS(Max-flow+TF-IDF+Seeds) assignment are introduced. Furthermore, we also propose an efficient ranking scheme which strengthens differences between community members according to their content similarity to community topics. When choosing the highest nodes in our new method, the high quality of new labeled seeds is ensured by taking the lexical similarity between node and seeds into account. The experimental results indicate that the content-combined approach can effectively handle a variety of data sets on increasing the size and quality of the extracted community and rank community pages more reasonably.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return