Abstract:
WWW is a complicated collection of hypertext and expands with tremendous speed. Finding and applying usable information of Web is a challenging job. There exist a lot of communities while Web evolves. These communities are very important information in Web organization. Knowing these communities is helpful to overview the whole Web. Organizing Web into communities has many advantages. With communities, users can navigate their interesting information, Internet/Intranet service providers can arrange efficient ports, and manufacturers can find right consumers. Community also reflects sociality of Web, because Web is a social network. At present, many communities are found and maintained by human effort. It is costly and difficult to update. Nevertheless, there are still many unknown and newly emerged communities. It is impossible to find them manually. Therefore, this motivates many researches on automatic or semi-automatic discovering technologies. The method of community extraction consists of two categories, one is topic-oriented, the other is non-topic. They have different data sources. The former uses results from search engine by a query term and the latter uses a raw data from a crawler. But this field is still new and there remain still many problems. This paper analyzes the algorithms of community finding at present, and describes the challenging problems and promising research trends.