高级检索

    Web社区发现技术综述

    Survey of Web Communities Identification

    • 摘要: Web是一个复杂超文本所组成的巨大的信息源,而且以很快的速度在不断的扩大.针对这样一个不断变化的信息源,如何利用和发现Web中的有用信息变得具有挑战性. Web在发展过程中存在着大量的社区,这些社区是Web组织中非常重要的信息.通过对社区信息的认识可以帮助我们总览Web的全貌.而将Web按照社区来组织有许多优点.社区可以引导用户找到感兴趣的信息;社区可以帮助Internet/Intranet服务提供者有效地组织门户;社区可以帮助制造商准确地找到消费者.社区还代表了Web的社会活动,因为Web就是一个社会性的网络.目前,许多社区的发现和维护是依靠人工来完成的,维护成本较高,修改也困难;此外,还存在着许多不为人知或者称为潜在的社区,而这些社区是无法通过人工来发现的.因此,许多研究都在致力于社区的自动或半自动发现技术.社区的发现主要采用基于Web图形的链接分析技术.在方法上大致上分为两类,一类是面向某个主题的社区发现,而另一个是无主题的社区发现技术.对于社区的发现技术做了较为全面的分析,并且总结了社区发现技术中依然存在的、挑战性的问题和未来的研究趋势.

       

      Abstract: WWW is a complicated collection of hypertext and expands with tremendous speed. Finding and applying usable information of Web is a challenging job. There exist a lot of communities while Web evolves. These communities are very important information in Web organization. Knowing these communities is helpful to overview the whole Web. Organizing Web into communities has many advantages. With communities, users can navigate their interesting information, Internet/Intranet service providers can arrange efficient ports, and manufacturers can find right consumers. Community also reflects sociality of Web, because Web is a social network. At present, many communities are found and maintained by human effort. It is costly and difficult to update. Nevertheless, there are still many unknown and newly emerged communities. It is impossible to find them manually. Therefore, this motivates many researches on automatic or semi-automatic discovering technologies. The method of community extraction consists of two categories, one is topic-oriented, the other is non-topic. They have different data sources. The former uses results from search engine by a query term and the latter uses a raw data from a crawler. But this field is still new and there remain still many problems. This paper analyzes the algorithms of community finding at present, and describes the challenging problems and promising research trends.

       

    /

    返回文章
    返回